Libraries tagged by text-extraction
kreuzberg/kreuzberg
164 Downloads
High-performance document intelligence library
silverstripe/textextraction
186816 Downloads
Text Extraction API for SilverStripe CMS (mostly used with 'fulltextsearch' module)
iamgerwin/php-pdf-to-markdown-parser
4443 Downloads
A lightweight PHP library to convert PDF documents into clean, structured Markdown. Supports text extraction, headings, lists, tables, diagrams and code blocks for easier content reuse and publishing.
oxide/pdf-oxide
3 Downloads
PDF processing toolkit (Rust-backed, FFI-bound) for PHP
keyvan/german-ocr
0 Downloads
High-performance German document OCR - Local & Cloud API
jcfrane/pdf-text-extractor
197 Downloads
A Laravel PDF text extraction package with multiple strategies (PdfParser, XObject, AWS Textract, Tesseract OCR). Handles Canva-generated PDFs, scanned documents, and other edge cases with automatic fallback.
daniel-jorg-schuppelius/php-pdf-toolkit
304 Downloads
PHP 8.2+ library for PDF text extraction with automatic reader selection. Supports embedded text and scanned documents via OCR.
moinul/laravel-pdf-to-html
100 Downloads
A Laravel package to convert PDF files to HTML using poppler-utils
manofstrong/sitescrapper
71 Downloads
A Package to Scrape Websites from their Sitemaps and Extract Relevant Content from the Webpage and Upload to a Database
aspose/pdf
850 Downloads
A powerful library for manipulating and converting PDF files.
xatham/text-extraction
17 Downloads
Easy text extraction for many different file types
teon/text-extraction
607 Downloads
Text Extraction Library
centertap/tika-all-the-files
107 Downloads
Mediawiki extension that provides extraction of searchable text and metadata from uploaded files, via Apache Tika
mayaram/laravel-ocr
1330 Downloads
Laravel OCR & Document Data Extractor - A powerful OCR and document parsing engine for Laravel
cryde/json-text-extractor
8578 Downloads
Helper that will extract JSON from plain text