Download the PHP package shibashish/pdf-reader without Composer
On this page you can find all versions of the php package shibashish/pdf-reader. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download shibashish/pdf-reader
More information about shibashish/pdf-reader
Files in shibashish/pdf-reader
Package pdf-reader
Short Description A comprehensive Laravel package for extracting text, HTML, images, and metadata from PDF files using Poppler utilities.
License MIT
Informations about the package pdf-reader
PDF Reader Package for Laravel
A comprehensive, production-ready Laravel package for extracting content from PDF files using Poppler utilities. This package provides a secure, type-safe interface for PDF manipulation with extensive error handling and validation.
đ Table of Contents
- Overview
- Features
- System Requirements
- Dependencies
- Installation
- Configuration
- Usage Guide
- Exception Handling
- Testing
- Architecture
- Troubleshooting
- License
Overview
The PDF Reader Package wraps the powerful Poppler command-line utilities in a clean, Laravel-friendly API. It handles PDF text extraction, HTML conversion, image extraction, and metadata retrieval with built-in validation, security, and error handling.
Why This Package?
- Secure: Uses Laravel's
Processfacade instead of unsafeshell_exec - Validated: Checks file existence, readability, and PDF format before processing
- Type-Safe: Full PHP 8.2+ type hints for better IDE support
- Cross-Platform: Works on Windows, macOS, and Linux
- Well-Tested: Comprehensive Pest test suite included
- Production-Ready: Proper exception handling and logging support
Features
Core Functionality
- đ Text Extraction - Extract plain text from PDFs with optional page ranges
- đ HTML Conversion - Convert PDFs to HTML while preserving layout
- đŧī¸ Image Extraction - Extract all embedded images from PDFs
- âšī¸ Metadata Retrieval - Get PDF properties (author, title, page count, etc.)
Advanced Features
- đ Page Range Support - Extract specific pages (e.g., "1-5", "3-10")
- â Input Validation - Automatic file existence and PDF format validation
- đ Secure Execution - Uses Laravel Process facade for safe command execution
- đ¯ Custom Exceptions - Specific exceptions for different error scenarios
- đž File Management - Option to keep or auto-delete temporary files
- đ Cross-Platform - Proper path handling for all operating systems
System Requirements
Required Software
- PHP: 8.2 or higher
- Laravel: 10.0 or higher
- Poppler Utilities: All binaries must be installed and accessible
Poppler Binaries
The package requires the following Poppler command-line tools:
pdftotext- Text extractionpdftohtml- HTML conversionpdfinfo- Metadata retrievalpdfimages- Image extraction
Dependencies
Installing Poppler Utilities
Ubuntu/Debian
Verify installation:
macOS
Using Homebrew:
Verify installation:
Windows
- Download Poppler for Windows from GitHub Releases
- Extract the archive to a permanent location (e.g.,
C:\Program Files\poppler) - Add the
bindirectory to your system PATH:- Right-click "This PC" â Properties â Advanced system settings
- Environment Variables â System variables â Path â Edit
- Add:
C:\Program Files\poppler\Library\bin
- Restart your terminal/IDE
Verify installation:
Laravel Dependencies
This package uses the following Laravel features:
Illuminate\Support\Facades\Process- For secure command executionIlluminate\Support\ServiceProvider- For package registrationIlluminate\Support\Facades\Facade- For the PdfReader facade
All dependencies are included in Laravel 10+.
Installation
Step 1: Package Location
This package is located at:
It's already configured in your main composer.json under autoload-dev.
Step 2: Publish Configuration
Publish the package configuration file to your Laravel application:
This creates config/pdf-reader.php with default settings.
Step 3: Configure Binary Paths (Optional)
If Poppler binaries are not in your system PATH, specify full paths in .env:
Windows Example:
Step 4: Create Storage Directories
The package auto-creates these directories when needed, but you can create them manually:
Configuration
Configuration File
The published config/pdf-reader.php file contains:
Configuration Options
| Key | Default | Description |
|---|---|---|
pdftotext_binary |
pdftotext |
Path to pdftotext executable |
pdftohtml_binary |
pdftohtml |
Path to pdftohtml executable |
pdfinfo_binary |
pdfinfo |
Path to pdfinfo executable |
pdfimages_binary |
pdfimages |
Path to pdfimages executable |
Note: If binaries are in your system PATH, you can use just the binary name. Otherwise, provide the full absolute path.
Usage Guide
Import the Facade
Text Extraction
Basic Text Extraction
Extract all text from a PDF:
Extract Specific Pages
Extract text from pages 1 to 5:
Extract text from a single page:
Keep Output File
By default, temporary files are deleted. To keep them:
Method Signature
HTML Conversion
Basic HTML Conversion
Convert entire PDF to HTML:
Convert Specific Pages
Keep Output File
Method Signature
Image Extraction
Extract All Images
Keep Image Files
Extract from Specific Pages
Save Images to Custom Location
Method Signature
Metadata Retrieval
Get PDF Information
Access Specific Metadata
Method Signature
Exception Handling
The package throws specific exceptions for different error scenarios.
Exception Hierarchy
InvalidPdfException
Thrown when:
- File doesn't exist
- File is not readable
- File is not a valid PDF
BinaryNotFoundException
Thrown when a required Poppler binary is not found:
PdfReaderException
Thrown for general extraction errors:
Complete Exception Handling
Testing
The package includes comprehensive Pest tests.
Run Package Tests
From your Laravel project root:
Test Coverage
The test suite covers:
- â Text extraction with validation
- â HTML conversion with page ranges
- â Metadata retrieval and parsing
- â Image extraction
- â Exception handling (invalid files, missing binaries)
- â Directory creation
- â Cross-platform path handling
Example Test Output
Architecture
Package Structure
Service Provider
The PdfReaderServiceProvider registers the service as a singleton:
Facade
The PdfReader facade provides static access:
Service Class
PdfReaderService handles all PDF operations:
- Input validation
- Command building
- Process execution
- Error handling
- Output parsing
Troubleshooting
Binary Not Found
Error: BinaryNotFoundException: The required binary 'pdftotext' was not found
Solutions:
- Verify Poppler is installed:
which pdftotext(Linux/Mac) orwhere pdftotext(Windows) -
Add binary paths to
.env: - Ensure binaries are in system PATH
Permission Denied
Error: InvalidPdfException: The file is not readable
Solutions:
- Check file permissions:
ls -la /path/to/file.pdf - Ensure web server user has read access:
Invalid PDF
Error: InvalidPdfException: The file is not a valid PDF
Solutions:
- Verify file is actually a PDF:
file /path/to/file.pdf - Check file isn't corrupted
- Ensure file has proper PDF header (
%PDF-)
Output Directory Not Created
Error: Permission issues with storage/app/public/pdf-reader
Solutions:
-
Ensure storage directory is writable:
- Create directories manually:
Windows Path Issues
Error: Mixed path separators causing issues
Solution: The package uses DIRECTORY_SEPARATOR for cross-platform compatibility. Ensure you're using the latest version.
Output Files
Storage Locations
When keepFile: true or keepFiles: true, extracted files are saved to:
| Type | Location |
|---|---|
| Text | storage/app/public/pdf-reader/texts/ |
| HTML | storage/app/public/pdf-reader/htmls/ |
| Images | storage/app/public/pdf-reader/images/ |
File Naming Convention
- Text:
pdf-text-{timestamp}.txt - HTML:
pdf-html-{timestamp}.html - Images:
pdf-img-{timestamp}-{number}.{ext}
Accessing Saved Files
Best Practices
1. Always Handle Exceptions
2. Validate Input Before Processing
3. Clean Up Temporary Files
4. Use Page Ranges for Large PDFs
5. Configure Binaries in Environment
License
MIT License
Copyright (c) 2024 Shibashish
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
All versions of pdf-reader with dependencies
illuminate/support Version ^10.0|^11.0|^12.0
illuminate/process Version ^10.0|^11.0|^12.0