Download the PHP package thiagoalessio/tesseract_ocr without Composer

On this page you can find all versions of the php package thiagoalessio/tesseract_ocr. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?
thiagoalessio/tesseract_ocr
Rate from 1 - 5
Rated 4.67 based on 3 reviews

Informations about the package tesseract_ocr

Tesseract OCR for PHP

Tesseract OCR for PHP

A wrapper to work with Tesseract OCR inside PHP.

[![CI][ci_badge]][ci] [![AppVeyor][appveyor_badge]][appveyor] [![Codacy][codacy_badge]][codacy] [![Test Coverage][test_coverage_badge]][test_coverage]
[![Latest Stable Version][stable_version_badge]][packagist] [![Total Downloads][total_downloads_badge]][packagist] [![Monthly Downloads][monthly_downloads_badge]][packagist]

Installation

Via [Composer][]:

$ composer require thiagoalessio/tesseract_ocr

:bangbang: This library depends on [Tesseract OCR][], version 3.02 or later.


![][windows_icon] Note for Windows users

There are [many ways][tesseract_installation_on_windows] to install [Tesseract OCR][] on your system, but if you just want something quick to get up and running, I recommend installing the [Capture2Text][] package with [Chocolatey][].

choco install capture2text --version 3.9

:warning: Recent versions of [Capture2Text][] stopped shipping the tesseract binary.


![][macos_icon] Note for macOS users

With [MacPorts][] you can install support for individual languages, like so:

$ sudo port install tesseract-<langcode>

But that is not possible with [Homebrew][]. It comes only with English support by default, so if you intend to use it for other language, the quickest solution is to install them all:

$ brew install tesseract tesseract-lang


Usage

Basic usage


Other languages


Multiple languages


Inducing recognition


Breaking CAPTCHAs

Yes, I know some of you might want to use this library for the noble purpose of breaking CAPTCHAs, so please take a look at this comment:

https://github.com/thiagoalessio/tesseract-ocr-for-php/issues/91#issuecomment-342290510

API

run

Executes a tesseract command, optionally receiving an integer as timeout, in case you experience stalled tesseract processes.

image

Define the path of an image to be recognized by tesseract.

imageData

Set the image to be recognized by tesseract from a string, with its size. This can be useful when dealing with files that are already loaded in memory. You can easily retrieve the image data and size of an image object :

executable

Define a custom location of the tesseract executable, if by any reason it is not present in the $PATH.

version

Returns the current version of tesseract.

availableLanguages

Returns a list of available languages/scripts.

More info: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages-and-scripts

tessdataDir

Specify a custom location for the tessdata directory.

userWords

Specify the location of user words file.

This is a plain text file containing a list of words that you want to be considered as a normal dictionary words by tesseract.

Useful when dealing with contents that contain technical terminology, jargon, etc.

userPatterns

Specify the location of user patterns file.

If the contents you are dealing with have known patterns, this option can help a lot tesseract's recognition accuracy.

lang

Define one or more languages to be used during the recognition. A complete list of available languages can be found at: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages

Tip from [@daijiale][]: Use the combination ->lang('chi_sim', 'chi_tra') for proper recognition of Chinese.

psm

Specify the Page Segmentation Method, which instructs tesseract how to interpret the given image.

More info: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method

oem

Specify the OCR Engine Mode. (see tesseract --help-oem)

dpi

Specify the image DPI. It is useful if your image does not contain this information in its metadata.

allowlist

This is a shortcut for ->config('tessedit_char_whitelist', 'abcdef....').

configFile

Specify a config file to be used. It can either be the path to your own config file or the name of one of the predefined config files: https://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs

setOutputFile

Specify an Outputfile to be used. Be aware: If you set an outputfile then the option withoutTempFiles is ignored. Tempfiles are written (and deleted) even if withoutTempFiles = true.

In combination with configFile you are able to get the hocr, tsv or pdf files.

digits

Shortcut for ->configFile('digits').

hocr

Shortcut for ->configFile('hocr').

pdf

Shortcut for ->configFile('pdf').

quiet

Shortcut for ->configFile('quiet').

tsv

Shortcut for ->configFile('tsv').

txt

Shortcut for ->configFile('txt').

tempDir

Define a custom directory to store temporary files generated by tesseract. Make sure the directory actually exists and the user running php is allowed to write in there.

withoutTempFiles

Specify that tesseract should output the recognized text without writing to temporary files. The data is gathered from the standard output of tesseract instead.

Other options

Any configuration option offered by Tesseract can be used like that:

Or like that:

More info: https://github.com/tesseract-ocr/tesseract/wiki/ControlParams

Thread-limit

Sometimes, it may be useful to limit the number of threads that tesseract is allowed to use (e.g. in this case). Set the maxmium number of threads as param for the run function:

How to contribute

You can contribute to this project by:

Just make sure you take a look at our [Code of Conduct][] and [Contributing][] instructions.

License

tesseract-ocr-for-php is released under the [MIT License][].

Made with in Berlin

[ci_badge]: https://github.com/thiagoalessio/tesseract-ocr-for-php/workflows/CI/badge.svg?event=push&branch=main [ci]: https://github.com/thiagoalessio/tesseract-ocr-for-php/actions?query=workflow%3ACI [appveyor_badge]: https://ci.appveyor.com/api/projects/status/xwy5ls0798iwcim3/branch/main?svg=true [appveyor]: https://ci.appveyor.com/project/thiagoalessio/tesseract-ocr-for-php/branch/main [codacy_badge]: https://app.codacy.com/project/badge/Grade/a81aa10012874f23a57df5b492d835f2 [codacy]: https://app.codacy.com/gh/thiagoalessio/tesseract-ocr-for-php/dashboard [test_coverage_badge]: https://codecov.io/gh/thiagoalessio/tesseract-ocr-for-php/branch/main/graph/badge.svg?token=Y0VnrqiSIf [test_coverage]: https://codecov.io/gh/thiagoalessio/tesseract-ocr-for-php [stable_version_badge]: https://img.shields.io/packagist/v/thiagoalessio/tesseract_ocr.svg [packagist]: https://packagist.org/packages/thiagoalessio/tesseract_ocr [total_downloads_badge]: https://img.shields.io/packagist/dt/thiagoalessio/tesseract_ocr.svg [monthly_downloads_badge]: https://img.shields.io/packagist/dm/thiagoalessio/tesseract_ocr.svg [Tesseract OCR]: https://github.com/tesseract-ocr/tesseract [Composer]: http://getcomposer.org/ [windows_icon]: https://thiagoalessio.github.io/tesseract-ocr-for-php/images/windows-18.svg [macos_icon]: https://thiagoalessio.github.io/tesseract-ocr-for-php/images/apple-18.svg [tesseract_installation_on_windows]: https://github.com/tesseract-ocr/tesseract/wiki#windows [Capture2Text]: https://chocolatey.org/packages/capture2text [Chocolatey]: https://chocolatey.org [MacPorts]: https://www.macports.org [Homebrew]: https://brew.sh [@daijiale]: https://github.com/daijiale [HOCR]: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#hocr-output [TSV]: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github [Issue]: https://github.com/thiagoalessio/tesseract-ocr-for-php/issues [Pull Request]: https://github.com/thiagoalessio/tesseract-ocr-for-php/pulls [Code of Conduct]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CODE_OF_CONDUCT.md [Contributing]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CONTRIBUTING.md [MIT License]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE

All versions of tesseract_ocr with dependencies

PHP Build Version
Package Version
Requires php Version ^5.3 || ^7.0 || ^8.0
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package thiagoalessio/tesseract_ocr contains the following files

Loading the files please wait ....