Libraries tagged by content extraction

j0k3r/php-readability

190 Favers
869769 Downloads

Automatic article extraction from HTML

Go to Download


causal/extractor

16 Favers
257750 Downloads

This extension detects and extracts metadata (EXIF / IPTC / XMP / ...) from potentially thousand different file types (such as MS Word/Powerpoint/Excel documents, PDF and images) and bring them automatically and natively to TYPO3 when uploading assets. Works with built-in PHP functions but takes advantage of Apache Tika and other external tools for enhanced metadata extraction.

Go to Download


iamgerwin/php-pdf-to-markdown-parser

5 Favers
4343 Downloads

A lightweight PHP library to convert PDF documents into clean, structured Markdown. Supports text extraction, headings, lists, tables, diagrams and code blocks for easier content reuse and publishing.

Go to Download


reelflow/reelflow-php

6 Favers
1155 Downloads

Elegant and powerful Instagram video downloader for seamless content extraction

Go to Download


kalimeromk/rssfeed

3 Favers
919 Downloads

Full-Text RSS extraction package for Laravel - converts partial RSS feeds to full content

Go to Download


fitzage/optimus-bard

3 Favers
3857 Downloads

Optimus Bard takes the content from a Statamic Bard field and transforms it into a string when updating your search index

Go to Download


vanry/readability

5 Favers
43 Downloads

Automatic article content extraction from html and html parser.

Go to Download


iserter/php-goose

0 Favers
295 Downloads

PHP 8+ article/content extractor. replacement for scotteh/php-goose (Goose)

Go to Download


manofstrong/sitescrapper

6 Favers
71 Downloads

A Package to Scrape Websites from their Sitemaps and Extract Relevant Content from the Webpage and Upload to a Database

Go to Download


ncjoes/pdf-suite

8 Favers
275 Downloads

A high level wrapper over Poppler-Php for PDF content extraction and conversion using Poppler utils

Go to Download


content-extract/content-processor

1 Favers
20 Downloads

Robust PHP library for batch document processing. Extracts content from PDFs/text and generates structured JSON according to user-defined schemas. Now with semantic structuring, OCR support for scanned PDFs, text normalization, and alias-driven field matching. Production-ready, secure, zero unnecessary dependencies.

Go to Download


sharpapi/laravel-content-detect-emails

1 Favers
1 Downloads

AI Email Detection for Laravel powered by SharpAPI.com

Go to Download


xtroo/php-client

1 Favers
14 Downloads

Xtroo PHP Client Library

Go to Download


techulus/capture

0 Favers
15 Downloads

Official PHP SDK for Capture (capture.page). Capture screenshots, generate PDFs, extract content and metadata from web pages.

Go to Download


tacman/php-readability

0 Favers
50 Downloads

Automatic article extraction from HTML, fork of j0k3r/php-readability

Go to Download


Next >>