Libraries tagged by content extraction

j0k3r/php-readability

190 Favers
931326 Downloads

Automatic article extraction from HTML

This extension detects and extracts metadata (EXIF / IPTC / XMP / ...) from potentially thousand different file types (such as MS Word/Powerpoint/Excel documents, PDF and images) and bring them automatically and natively to TYPO3 when uploading assets. Works with built-in PHP functions but takes advantage of Apache Tika and other external tools for enhanced metadata extraction.

Go to Download

iamgerwin/php-pdf-to-markdown-parser

8 Favers
9791 Downloads

A lightweight PHP library to convert PDF documents into clean, structured Markdown. Supports text extraction, headings, lists, tables, diagrams and code blocks for easier content reuse and publishing.

Go to Download

reelflow/reelflow-php

7 Favers
1502 Downloads

Elegant and powerful Instagram video downloader for seamless content extraction

Go to Download

iserter/php-goose

0 Favers
528 Downloads

PHP 8+ article/content extractor. replacement for scotteh/php-goose (Goose)

Go to Download

fitzage/optimus-bard

3 Favers
3976 Downloads

Optimus Bard takes the content from a Statamic Bard field and transforms it into a string when updating your search index

Go to Download

vanry/readability

5 Favers
43 Downloads

Automatic article content extraction from html and html parser.

Go to Download

kalimeromk/rssfeed

4 Favers
924 Downloads

Full-Text RSS extraction package for Laravel - converts partial RSS feeds to full content

Go to Download

manofstrong/sitescrapper

6 Favers
71 Downloads

A Package to Scrape Websites from their Sitemaps and Extract Relevant Content from the Webpage and Upload to a Database

Go to Download

cosmira/envelope

0 Favers
80 Downloads

An elegant PHP library for parsing and extracting structured email contents, including attachments and metadata.

Go to Download

ncjoes/pdf-suite

8 Favers
276 Downloads

A high level wrapper over Poppler-Php for PDF content extraction and conversion using Poppler utils

Go to Download

content-extract/content-processor

1 Favers
20 Downloads

Robust PHP library for batch document processing. Extracts content from PDFs/text and generates structured JSON according to user-defined schemas. Now with semantic structuring, OCR support for scanned PDFs, text normalization, and alias-driven field matching. Production-ready, secure, zero unnecessary dependencies.

Go to Download