Download the PHP package thiagoalessio/tesseract_ocr without Composer
On this page you can find all versions of the php package thiagoalessio/tesseract_ocr. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download thiagoalessio/tesseract_ocr
More information about thiagoalessio/tesseract_ocr
Files in thiagoalessio/tesseract_ocr
Package tesseract_ocr
Short Description A wrapper to work with Tesseract OCR inside PHP.
License MIT
Rated 4.67 based on 3 reviews
Informations about the package tesseract_ocr
Tesseract OCR for PHP
A wrapper to work with Tesseract OCR inside PHP.
[![CI][ci_badge]][ci]
[![AppVeyor][appveyor_badge]][appveyor]
[![Codacy][codacy_badge]][codacy]
[![Test Coverage][test_coverage_badge]][test_coverage]
[![Latest Stable Version][stable_version_badge]][packagist]
[![Total Downloads][total_downloads_badge]][packagist]
[![Monthly Downloads][monthly_downloads_badge]][packagist]
Installation
Via [Composer][]:
$ composer require thiagoalessio/tesseract_ocr
:bangbang: This library depends on [Tesseract OCR][], version 3.02 or later.
![][windows_icon] Note for Windows users
There are [many ways][tesseract_installation_on_windows] to install [Tesseract OCR][] on your system, but if you just want something quick to get up and running, I recommend installing the [Capture2Text][] package with [Chocolatey][].
choco install capture2text --version 3.9
:warning: Recent versions of [Capture2Text][] stopped shipping the tesseract
binary.
![][macos_icon] Note for macOS users
With [MacPorts][] you can install support for individual languages, like so:
$ sudo port install tesseract-<langcode>
But that is not possible with [Homebrew][]. It comes only with English support by default, so if you intend to use it for other language, the quickest solution is to install them all:
$ brew install tesseract tesseract-lang
Usage
Basic usage
Other languages
Multiple languages
Inducing recognition
Breaking CAPTCHAs
Yes, I know some of you might want to use this library for the noble purpose of breaking CAPTCHAs, so please take a look at this comment:
https://github.com/thiagoalessio/tesseract-ocr-for-php/issues/91#issuecomment-342290510
API
run
Executes a tesseract
command, optionally receiving an integer as timeout
,
in case you experience stalled tesseract processes.
image
Define the path of an image to be recognized by tesseract
.
imageData
Set the image to be recognized by tesseract
from a string, with its size.
This can be useful when dealing with files that are already loaded in memory.
You can easily retrieve the image data and size of an image object :
executable
Define a custom location of the tesseract
executable,
if by any reason it is not present in the $PATH
.
version
Returns the current version of tesseract
.
availableLanguages
Returns a list of available languages/scripts.
More info: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages-and-scripts
tessdataDir
Specify a custom location for the tessdata directory.
userWords
Specify the location of user words file.
This is a plain text file containing a list of words that you want to be
considered as a normal dictionary words by tesseract
.
Useful when dealing with contents that contain technical terminology, jargon, etc.
userPatterns
Specify the location of user patterns file.
If the contents you are dealing with have known patterns, this option can help a lot tesseract's recognition accuracy.
lang
Define one or more languages to be used during the recognition. A complete list of available languages can be found at: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages
Tip from [@daijiale][]: Use the combination ->lang('chi_sim', 'chi_tra')
for proper recognition of Chinese.
psm
Specify the Page Segmentation Method, which instructs tesseract
how to
interpret the given image.
More info: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method
oem
Specify the OCR Engine Mode. (see tesseract --help-oem
)
dpi
Specify the image DPI. It is useful if your image does not contain this information in its metadata.
allowlist
This is a shortcut for ->config('tessedit_char_whitelist', 'abcdef....')
.
configFile
Specify a config file to be used. It can either be the path to your own config file or the name of one of the predefined config files: https://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs
setOutputFile
Specify an Outputfile to be used. Be aware: If you set an outputfile then
the option withoutTempFiles
is ignored.
Tempfiles are written (and deleted) even if withoutTempFiles = true
.
In combination with configFile
you are able to get the hocr
, tsv
or
pdf
files.
digits
Shortcut for ->configFile('digits')
.
hocr
Shortcut for ->configFile('hocr')
.
Shortcut for ->configFile('pdf')
.
quiet
Shortcut for ->configFile('quiet')
.
tsv
Shortcut for ->configFile('tsv')
.
txt
Shortcut for ->configFile('txt')
.
tempDir
Define a custom directory to store temporary files generated by tesseract.
Make sure the directory actually exists and the user running php
is allowed
to write in there.
withoutTempFiles
Specify that tesseract
should output the recognized text without writing to temporary files.
The data is gathered from the standard output of tesseract
instead.
Other options
Any configuration option offered by Tesseract can be used like that:
Or like that:
More info: https://github.com/tesseract-ocr/tesseract/wiki/ControlParams
Thread-limit
Sometimes, it may be useful to limit the number of threads that tesseract is
allowed to use (e.g. in this case).
Set the maxmium number of threads as param for the run
function:
How to contribute
You can contribute to this project by:
- Opening an [Issue][] if you found a bug or wish to propose a new feature;
- Placing a [Pull Request][] with code that fix a bug, missing/wrong documentation or implement a new feature;
Just make sure you take a look at our [Code of Conduct][] and [Contributing][] instructions.
License
tesseract-ocr-for-php is released under the [MIT License][].
Made with in Berlin
[ci_badge]: https://github.com/thiagoalessio/tesseract-ocr-for-php/workflows/CI/badge.svg?event=push&branch=main [ci]: https://github.com/thiagoalessio/tesseract-ocr-for-php/actions?query=workflow%3ACI [appveyor_badge]: https://ci.appveyor.com/api/projects/status/xwy5ls0798iwcim3/branch/main?svg=true [appveyor]: https://ci.appveyor.com/project/thiagoalessio/tesseract-ocr-for-php/branch/main [codacy_badge]: https://app.codacy.com/project/badge/Grade/a81aa10012874f23a57df5b492d835f2 [codacy]: https://app.codacy.com/gh/thiagoalessio/tesseract-ocr-for-php/dashboard [test_coverage_badge]: https://codecov.io/gh/thiagoalessio/tesseract-ocr-for-php/branch/main/graph/badge.svg?token=Y0VnrqiSIf [test_coverage]: https://codecov.io/gh/thiagoalessio/tesseract-ocr-for-php [stable_version_badge]: https://img.shields.io/packagist/v/thiagoalessio/tesseract_ocr.svg [packagist]: https://packagist.org/packages/thiagoalessio/tesseract_ocr [total_downloads_badge]: https://img.shields.io/packagist/dt/thiagoalessio/tesseract_ocr.svg [monthly_downloads_badge]: https://img.shields.io/packagist/dm/thiagoalessio/tesseract_ocr.svg [Tesseract OCR]: https://github.com/tesseract-ocr/tesseract [Composer]: http://getcomposer.org/ [windows_icon]: https://thiagoalessio.github.io/tesseract-ocr-for-php/images/windows-18.svg [macos_icon]: https://thiagoalessio.github.io/tesseract-ocr-for-php/images/apple-18.svg [tesseract_installation_on_windows]: https://github.com/tesseract-ocr/tesseract/wiki#windows [Capture2Text]: https://chocolatey.org/packages/capture2text [Chocolatey]: https://chocolatey.org [MacPorts]: https://www.macports.org [Homebrew]: https://brew.sh [@daijiale]: https://github.com/daijiale [HOCR]: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#hocr-output [TSV]: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github [Issue]: https://github.com/thiagoalessio/tesseract-ocr-for-php/issues [Pull Request]: https://github.com/thiagoalessio/tesseract-ocr-for-php/pulls [Code of Conduct]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CODE_OF_CONDUCT.md [Contributing]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CONTRIBUTING.md [MIT License]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE