Libraries tagged by HTML extraction
j0k3r/php-readability
578055 Downloads
Automatic article extraction from HTML
atrox/matcher
86730 Downloads
Powerful XML and HTML matching and data extraction library
dotpack/php-boiler-pipe
4643 Downloads
PhpBoilerPipe. Boilerplate Removal and Fulltext Extraction from HTML pages
einfacharchiv/microdata
231 Downloads
Extract billing data from HTML (supporting Microdata and JSON-LD)
vanry/readability
42 Downloads
Automatic article content extraction from html and html parser.
tacman/php-readability
40 Downloads
Automatic article extraction from HTML, fork of j0k3r/php-readability
pforret/pf-article-extractor
56 Downloads
PhpArticleExtractor. Boilerplate Removal and Fulltext Extraction from HTML pages
sleimanx2/grawler
296 Downloads
A guided html crawler with media meta extraction
ncjoes/pdf-suite
239 Downloads
A high level wrapper over Poppler-Php for PDF content extraction and conversion using Poppler utils
aspose/pdf
102 Downloads
A powerful library for manipulating and converting PDF files.
anshu-krishna/html-scraper
15 Downloads
A set of PHP classes to simplify data extraction from HTML.
matejch/html_helpers
6 Downloads
Helper class for removing elements and content, and extracting file paths
gregpriday/laravel-zyte-api
30 Downloads
A Laravel package for seamless integration with Zyte's web scraping API, offering functionalities for extracting raw HTML, browser-rendered HTML, and structured article content.
ngfw/webparser
5 Downloads
WebParser is a PHP library that allows developers to parse and query webpages using an ORM-like syntax. It facilitates the extraction of HTML elements by chaining operations such as filtering by ID or class, ordering results, and limiting output. WebParser offers a flexible interface for exploring and extracting data from the web, making it ideal for web scraping and data analysis tasks.
clientbg/php-boiler-pipe
37 Downloads
PhpBoilerPipe. Boilerplate Removal and Fulltext Extraction from HTML pages. Based on dotpack's PHP implementation.