Libraries tagged by content extraction
j0k3r/php-readability
561635 Downloads
Automatic article extraction from HTML
causal/extractor
161431 Downloads
This extension detects and extracts metadata (EXIF / IPTC / XMP / ...) from potentially thousand different file types (such as MS Word/Powerpoint/Excel documents, PDF and images) and bring them automatically and natively to TYPO3 when uploading assets. Works with built-in PHP functions but takes advantage of Apache Tika and other external tools for enhanced metadata extraction.
vanry/readability
42 Downloads
Automatic article content extraction from html and html parser.
tacman/php-readability
13 Downloads
Automatic article extraction from HTML, fork of j0k3r/php-readability
manofstrong/sitescrapper
70 Downloads
A Package to Scrape Websites from their Sitemaps and Extract Relevant Content from the Webpage and Upload to a Database
ncjoes/pdf-suite
239 Downloads
A high level wrapper over Poppler-Php for PDF content extraction and conversion using Poppler utils
xtroo/php-client
13 Downloads
Xtroo PHP Client Library
gregpriday/laravel-zyte-api
30 Downloads
A Laravel package for seamless integration with Zyte's web scraping API, offering functionalities for extracting raw HTML, browser-rendered HTML, and structured article content.
ahadabasi/php-readability
1 Downloads
Automatic article extraction from HTML
matejch/html_helpers
6 Downloads
Helper class for removing elements and content, and extracting file paths
hstanleycrow/easyphparticleextractor
10 Downloads
Free PHP library to extract the main content from an article post or news post, including images and HTML
arania/arania
12 Downloads
Tiny Framewaork For Web Content Extraction
discommand2/plugin-browser
0 Downloads
Employs web scraping technologies for data extraction and interaction with web content.
teners/laravel-link-preview
1371 Downloads
A Laravel package for extracting link previews with customizable parsers, and caching support
ballen/linguist
1983 Downloads
Linguist is a PHP library for parsing strings and extracting prefixed words in content ideal for working with @mentions, #topics and custom tags.