Download the PHP package vaites/php-apache-tika without Composer

On this page you can find all versions of the php package vaites/php-apache-tika. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package php-apache-tika

Current release Package at Packagist Build status Code coverage Code quality Code insight License

PHP Apache Tika

This tool provides Apache Tika bindings for PHP, allowing to extract text and metadata from documents, images and other formats.

The following modes are supported:

Server mode is recommended because is 5 times faster, but some shared hosts don't allow run processes in background.

Although the library contains a list of supported versions, any version of Apache Tika should be compatible as long as backward compatibility is maintained by Tika team. Therefore, it is not necessary to wait for an update of the library to work with the new versions of the tool.

Features

Requirements

NOTE: the supported PHP version will remain synced with the latest supported by PHP team

Installation

Install using Composer:

If you want to use OCR you must install Tesseract:

The library assumes tesseract binary is in path, so you can compile it yourself or install using any other method.

Usage

Start Apache Tika server with caution:

If you are using JRE instead of JDK, you must run if you have Java 9 or greater:

Instantiate the class, checking if JAR exists or server is running:

If you want to use dependency injection, serialize the class or just delay the check:

You can use an URL too:

Use the class to extract text from documents:

Or use to extract text from images:

You can use an URL instead of a file path and the library will download the file and pass it to Apache Tika. There's no need to add -enableUnsecureFeatures -enableFileUrl to command line when starting the server, as described here.

If you use Apache Tika >= 2.0.0, you can define an HttpFetcher and use the option -enableUnsecureFeatures -enableFileUrl when starting the server to make the server download remote files when passing a URL instead of a filename. In order to do so, you must set the name of the HttpFetcher using $client->setFetcherName('yourFetcherName').

Methods

Here are the full list of available methods

Common

Tika file related methods:

Other Tika related methods:

Encoding methods:

Supported versions related methods:

Set/get a callback for sequential read of response:

Set/get the chunk size for secuential read:

Enable/disable the internal remote file downloader:

Set the fetcher name:

Command line client

Set/get JAR/Java paths (only CLI mode):

Web client

Set/get host properties

Set/get cURL client options

Set/get timeout:

Set/get HTTP headers (see TikaServer):

Set/get OCR languages (see TikaOCR):

Set HTTP fetcher name (for Tika >= 2.0.0 only, see https://cwiki.apache.org/confluence/display/TIKA/tika-pipes)

Breaking changes

Since 1.0 version there are some breaking changes:

See CHANGELOG.md for more details.

Troubleshooting

Empty responses or unexpected results

This library is only a proxy so if you get an empy responses or unexpected results the most common cause is Tika itself. A simple test is using the GUI to check the response:

  1. Run the Tika app without arguments: java -jar tika-app-x.xx.jar
  2. Drop your file or select it using File -> Open
  3. Wait until the metadata appears
  4. Get the text or HTML using View menu

If the results are the same, you must take a look into Tika's Jira and open an issue if necessary.

Encoding

By default the returned text is encoded with UTF-8, andthe Client::setEncoding() method allows to set the expected encoding.

Tests

Tests are designed to cover all features for all supported versions of Apache Tika in app mode and server mode. There are a few samples to test against:

Known issues

There are some issues found during tests, not related with this library:

Integrations


All versions of php-apache-tika with dependencies

PHP Build Version
Package Version
Requires php Version >=7.3.0
ext-curl Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package vaites/php-apache-tika contains the following files

Loading the files please wait ....