Download the PHP package hocvt/language-detection without Composer

On this page you can find all versions of the php package hocvt/language-detection. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package language-detection

language-detection

Build Status Code Coverage Version Total Downloads Maintenance Minimum PHP Version License
Build Status codecov Version Total Downloads Maintenance Minimum PHP Version License

This library can detect the language of a given text string. It can parse given training text in many different idioms into a sequence of N-grams and builds a database file in JSON format to be used in the detection phase. Then it can take a given text and detect its language using the database previously generated in the training phase. The library comes with text samples used for training and detecting text in 110 languages.

Table of Contents

Installation with Composer

Note: This library requires the Multibyte String extension in order to work.

Basic Usage

To detect the language correctly, the length of the input text should be at least some sentences.

Result:

API

__construct(array $result = [], string $dirname = '')

You can pass an array of languages to the constructor. To compare the desired sentence only with the given languages. This can dramatically increase the performance. The other parameter is optional and the name of the directory where the translations files are located.


whitelist(string ...$whitelist)

Provide a whitelist. Returns a list of languages, which are required.

Result:


blacklist(string ...$blacklist)

Provide a blacklist. Removes the given languages from the result.

Result:


bestResults()

Returns the best results.

Result:


limit(int $offset, int $length = null)

You can specify the number of records to return. For example the following code will return the top three entries.

Result:


close()

Returns the result as an array.

Result:


setTokenizer(TokenizerInterface $tokenizer)

The script use a tokenizer for getting all words in a sentence. You can define your own tokenizer to deal with numbers for example.

This will return only characters from the alphabet in lowercase and numbers between 0 and 9.


__toString()

Returns the top entrie of the result. Note the echo at the beginning.

Result:


jsonSerialize()

Serialized the data to JSON.

Result:


Method chaining

You can also combine methods with each other. The following example will remove all entries specified in the blacklist and returns only the top four entries.

Result:


ArrayAccess

You can also access the object directly as an array.

Result:


Supported languages

The library currently supports 110 languages. To get an overview of all supported languages please have a look at here.


Other languages

The library is trainable which means you can change, remove and add your own language files to it. If your language not supported, feel free to add your own language files. To do that, create a new directory in resources and add your training text to it.

Note: The training text should be a .txt file.

Example

As you can see, we can also used it to detect spam or ham.

When you stored your translation files outside of resources, you have to specify the path.

Whenever you change one of the translation files you must first generate a language profile for it. This may take a few seconds.

Remove these few lines after execution and now we can classify texts by their language with our own training text.


FAQ

How can I improve the detection phase?

To improve the detection phase you have to use more n-grams. But be careful this will slow down the script. I figured out that the detection phase is much better when you are using around 9.000 n-grams (default is 310). To do that look at the code right below:

First you have to train it. Now you can classify texts like before but you must specify how many n-grams you want to use.

Result:

Is the detection process slower if language files are very big?

No it is not. The trainer class will only use the best 310 n-grams of the language. If you don't change this number or add more language files it will not affect the performance. Only creating the N-grams is slower. However, the creation of N-grams must be done only once. The detection phase is only affected when you are trying to detect big chunks of texts.

Summary: The training phase will be slower but the detection phase remains the same.

Contributing

Feel free to contribute. Any help is welcome.

License

This projects is licensed under the terms of the MIT license.


All versions of language-detection with dependencies

PHP Build Version
Package Version
Requires php Version ^7
ext-mbstring Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package hocvt/language-detection contains the following files

Loading the files please wait ....