Download the PHP package joest8/pdfinterpreter without Composer
On this page you can find all versions of the php package joest8/pdfinterpreter. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download joest8/pdfinterpreter
More information about joest8/pdfinterpreter
Files in joest8/pdfinterpreter
Package pdfinterpreter
Short Description This class is designed to convert multiple PDF files, whether image-based or text-based, into an array of data.The class uses user-defined templates containing regular expressions to control the data extraction process, allowing for customized and flexible output.
License MIT
Informations about the package pdfinterpreter
Pdf Interpreter
Introduction
This class is designed to convert multiple PDF files, whether image-based or text-based, into an array of data. The class uses user-defined templates containing regular expressions to control the data extraction process, allowing for customized and flexible output.
Table of Contents
This README is divided into several sections:
- Installation
- Console Applications
- Automated installation
- Manual installation with homebrew
- Tesseract Language Files
- Usage
- Create Object
- Get Sample Output
- Set new Template
- Add pattern to template
- Get template
- Delete template
- Convert Files from Folder
- Convert File
Installation
Console Applications
To use this class, you'll need to install the following applications:
- Poppler (necessary to convert pdf to text and get information about number of pages in file)
- Tesseract (necessary to read and interpret png file)
- ImageMagick (necessary to convert pdf->png)
Make sure you have a package-manager installed on your system.
Automated installation
Run the following code from the source folder to autoinstall all dependencies and tesseract language files:
Manual installation with homebrew
If homebrew is installed run the following commands to install the Homebrew packages:
Manual installation of Tesseract Language Files
You also need to install the required Tesseract language files. You can check the available languages at: https://github.com/tesseract-ocr/tessdata_best/
Download the necessary language files and place them in the appropriate directory. To find the directory use:
Usage
Create Object
Get Sample Output
Using the get_sample_output
-Method will allow you to get a sample of a text output without any interpretation of patterns.
Set new template
Using the add_new_template
-Method will help you to create a new template.
For more informations about the demanded parameters read the DocBloc of the method.
Add pattern to template
Using the add_pattern_to_template
-Method will help you to add a new pattern to an existing template.
For more informations about the demanded parameters read the DocBloc of the method.
Get Template
Using the get_template
-Method will return the entire template.
For more informations about the demanded parameters read the DocBloc of the method.
Delete Template
Using the delete_template
-Method will delete the entire template.
For more informations about the demanded parameters read the DocBloc of the method.
Convert Files from Folder
Using the convert_folder
-Method will convert all files from a folder into an array of data.
For more informations about the demanded parameters read the DocBloc of the method.
Convert File
Using the convert_file
-Method will convert a single file into an array of data.
For more informations about the demanded parameters read the DocBloc of the method.