Libraries tagged by content extraction
j0k3r/php-readability
869769 Downloads
Automatic article extraction from HTML
causal/extractor
257750 Downloads
This extension detects and extracts metadata (EXIF / IPTC / XMP / ...) from potentially thousand different file types (such as MS Word/Powerpoint/Excel documents, PDF and images) and bring them automatically and natively to TYPO3 when uploading assets. Works with built-in PHP functions but takes advantage of Apache Tika and other external tools for enhanced metadata extraction.
iamgerwin/php-pdf-to-markdown-parser
4343 Downloads
A lightweight PHP library to convert PDF documents into clean, structured Markdown. Supports text extraction, headings, lists, tables, diagrams and code blocks for easier content reuse and publishing.
reelflow/reelflow-php
1155 Downloads
Elegant and powerful Instagram video downloader for seamless content extraction
kalimeromk/rssfeed
919 Downloads
Full-Text RSS extraction package for Laravel - converts partial RSS feeds to full content
fitzage/optimus-bard
3857 Downloads
Optimus Bard takes the content from a Statamic Bard field and transforms it into a string when updating your search index
vanry/readability
43 Downloads
Automatic article content extraction from html and html parser.
iserter/php-goose
295 Downloads
PHP 8+ article/content extractor. replacement for scotteh/php-goose (Goose)
manofstrong/sitescrapper
71 Downloads
A Package to Scrape Websites from their Sitemaps and Extract Relevant Content from the Webpage and Upload to a Database
ncjoes/pdf-suite
275 Downloads
A high level wrapper over Poppler-Php for PDF content extraction and conversion using Poppler utils
content-extract/content-processor
20 Downloads
Robust PHP library for batch document processing. Extracts content from PDFs/text and generates structured JSON according to user-defined schemas. Now with semantic structuring, OCR support for scanned PDFs, text normalization, and alias-driven field matching. Production-ready, secure, zero unnecessary dependencies.
sharpapi/laravel-content-detect-emails
1 Downloads
AI Email Detection for Laravel powered by SharpAPI.com
xtroo/php-client
14 Downloads
Xtroo PHP Client Library
techulus/capture
15 Downloads
Official PHP SDK for Capture (capture.page). Capture screenshots, generate PDFs, extract content and metadata from web pages.
tacman/php-readability
50 Downloads
Automatic article extraction from HTML, fork of j0k3r/php-readability