Download the PHP package nilgems/laravel-textract without Composer
On this page you can find all versions of the php package nilgems/laravel-textract. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package laravel-textract
Laravel Textract
A Laravel package to extract text from files like DOC, Excel, Image, Pdf and more.
Versions and compatibility
- Laravel 10 or higher is required.
- [Php 8.2]() or higher is required
Supported file formats
Following file formats is supported currently. You need to install proper extensions to your server to work with all the following extension related files. The package will check file content MIME type before execute.
- HTML
- TEXT
- DOC
- DOCX
- XLS, XLSX, XLSM, XLTX, XLTM, XLT
- CSV
- Image
- jpeg
- png
- gif
- ODT
- ODS
- RTF
- PPTX (NEW)
We are working hard to make this laravel plugin useful. If you found any issue please add a post on discussion.
Installation
Once installed you can do stuff like this:
Run the extractor to any supported file:
Option | Type | Default value | Required | Description |
---|---|---|---|---|
$file_path | No default value | Yes | Text extractable file absolute path. | |
$job_id | No | It's a optional parameter. Extraction job id. If this option is blank the plugin will auto create the ID | ||
$extra_data | No | It's a optional parameter. To pass extra parameter. If you are extracting a image file, you can mention languages and more by this parameter. |
Configuration
-
You can add provider in under the folder of your Laravel project. It's optional, the package automatically load the service provider in your application.
-
Add alias in under the folder of your Laravel project. It's optional, the package automatically load the in your application.
- To publish the file, run:
Example
Example 1:
You can extract text from supported file format.
It is recommended to use the extractor with Laravel Queue Job from better performance.
In there have a restriction of execution time and memory limit defined in file with the option and . If file size is big, the process may kill forcefully when exceed the limit. You can use or to run the process in background.
Example 2:
If you need to specify languages in image file for better extraction output from image file.
Dependencies
- To enable the image extraction feature you need to install Tesseract OCR
- To enable the PDF extraction feature you need to install pdftotext
- To work properly, your server must have following php extensions installed -
- ext-fileinfo
- ext-zip
- ext-gd or ext-imagick
- ext-xml
Tesseract OCR Installation
Ubuntu
- Update the system:
- Add Tesseract OCR 5 PPA to your system:
- Install Tesseract on Ubuntu 20.04 | 18.04:
- Once installation is complete update your system:
- Verify the installation:
Windows
- There are many ways to install Tesseract OCR on your system, but if you just want something quick to get up and running, I recommend installing the Capture2Text package with Chocolatey.
- Choco installation:
Note: Recent versions of Capture2Text stopped shipping the binary
PdfToText Installation
Ubuntu
- Update the system:
- Install PdfToText on Ubuntu 20.04 | 18.04:
- Verify the installation:
Windows
- Sorry but available via poppler and the poppler is not available yet for windows. But you can install and use the library by windows linux sub-system WLS. Alternatively, you can install Laravel Homestead in your project and using vagrant virtualization you can run the project in ubuntu virtual server.
License
💻 Tech Stack
All versions of laravel-textract with dependencies
ext-fileinfo Version *
ext-zip Version *
ext-xml Version *
ext-gd Version *
symfony/process Version 6.4.3
phpoffice/phpspreadsheet Version ^1.23
phpoffice/phpword Version ^0.18
laravel/framework Version ^10.0
thiagoalessio/tesseract_ocr Version ^2.12
html2text/html2text Version ^4.3
phpoffice/phppresentation Version ^1.0