Download the PHP package nitotm/efficient-language-detector without Composer

On this page you can find all versions of the php package nitotm/efficient-language-detector. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package efficient-language-detector

Efficient Language Detector

![supported PHP versions](https://img.shields.io/badge/PHP-%3E%3D%207.4-blue) [![license](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0) [![supported languages](https://img.shields.io/badge/supported%20languages-60-brightgreen.svg)](#languages) ![version](https://img.shields.io/badge/ver.-3.0-blue)

Efficient language detector (Nito-ELD or ELD) is a fast and accurate natural language detection software, written 100% in PHP, with a speed comparable to fast C++ compiled detectors, and accuracy within the range of the best detectors to date.

It has no dependencies, easy installation, all it's needed is PHP with the mb extension.
ELD is also available (outdated versions) in Javascript and Python.

  1. Installation
  2. How to use
  3. Benchmarks
  4. Databases
  5. Testing
  6. Languages

Changes from ELD v2 to v3:

  • detect()->language now returns string 'und' for undetermined instead of NULL
  • Databases are not compatible, and bigger, medium v2 ≈ small v3
  • dynamicLangSubset() function is removed
  • Function cleanText() is now named enableTextCleanup()

Installation

Configuration

It is recommended to use OPcache, specially for the larger databases to reduce load times.
We need to set opcache.interned_strings_buffer, opcache.memory_consumption high enough for each database
Recommended value in parentheses. Check Databases for more info.

php.ini setting Small Medium Large Extralarge
memory_limit >= 128 >= 340 >= 1060 >= 2200
opcache.interned... >= 8 (16) >= 16 (32) >= 60 (70) >= 116 (128)
opcache.memory >= 64 (128) >= 128 (230) >= 360 (450) >= 750 (820)

How to use?

detect() expects a UTF-8 string and returns an object with a language property, containing an ISO 639-1 code (or other selected format), or 'und' for undetermined language.

Languages subsets

Calling langSubset() once, will set the subset. The first call takes longer as it creates a new database, if saving the database file (default), it will be loaded next time we make the same subset.
To use a subset without additional overhead, the proper way is to instantiate the detector with the file saved and returned by langSubset(). Check available Languages below.

Other Functions

Benchmarks

I compared ELD with a different variety of detectors, as there are not many in PHP.

URL Version Language
https://github.com/nitotm/efficient-language-detector/ 3.0.0 PHP
https://github.com/pemistahl/lingua-py 2.0.2 Python
https://github.com/facebookresearch/fastText 0.9.2 C++
https://github.com/CLD2Owners/cld2 Aug 21, 2015 C++
https://github.com/patrickschur/language-detection 5.3.0 PHP
https://github.com/wooorm/franc 7.2.0 Javascript

Benchmarks:

time table accuracy table

Databases

Small Medium Large Extralarge
Pros Lowest memory Equilibrated Fastest Most accurate
Cons Least accurate Slowest (but fast) High memory Highest memory
File size 3 MB 10 MB 32 MB 71 MB
Memory usage 76 MB 280 MB 977 MB 2083 MB
Memory usage Cached 0.4 MB + OP 0.4 MB + OP 0.4 MB + OP 0.4 MB + OP
OPcache used memory 21 MB 69 MB 244 MB 539 MB
OPcache used interned 4 MB 10 MB 45 MB 98 MB
Load time Uncached 0.14 sec 0.5 sec 1.5 sec 3.4 sec
Load time Cached 0.0002 sec 0.0002 sec 0.0002 sec 0.0002 sec
Settings (Recommended)
memory_limit >= 128 >= 340 >= 1060 >= 2200
opcache.interned...* >= 8 (16) >= 16 (32) >= 60 (70) >= 116 (128)
opcache.memory >= 64 (128) >= 128 (230) >= 360 (450) >= 750 (820)

Testing

Default composer install might not include these files. Use --prefer-source to include them.

Languages

am, ar, az, be, bg, bn, ca, cs, da, de, el, en, es, et, eu, fa, fi, fr, gu, he, hi, hr, hu, hy, is, it, ja, ka, kn, ko, ku, lo, lt, lv, ml, mr, ms, nl, no, or, pa, pl, pt, ro, ru, sk, sl, sq, sr, sv, ta, te, th, tl, tr, uk, ur, vi, yo, zh

Amharic, Arabic, Azerbaijani (Latin), Belarusian, Bulgarian, Bengali, Catalan, Czech, Danish, German, Greek, English, Spanish, Estonian, Basque, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Armenian, Icelandic, Italian, Japanese, Georgian, Kannada, Korean, Kurdish (Arabic), Lao, Lithuanian, Latvian, Malayalam, Marathi, Malay (Latin), Dutch, Norwegian, Oriya, Punjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovene, Albanian, Serbian (Cyrillic), Swedish, Tamil, Telugu, Thai, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, Yoruba, Chinese

am, ar, az-Latn, be, bg, bn, ca, cs, da, de, el, en, es, et, eu, fa, fi, fr, gu, he, hi, hr, hu, hy, is, it, ja, ka, kn, ko, ku-Arab, lo, lt, lv, ml, mr, ms-Latn, nl, no, or, pa, pl, pt, ro, ru, sk, sl, sq, sr-Cyrl, sv, ta, te, th, tl, tr, uk, ur, vi, yo, zh

amh, ara, aze, bel, bul, ben, cat, ces, dan, deu, ell, eng, spa, est, eus, fas, fin, fra, guj, heb, hin, hrv, hun, hye, isl, ita, jpn, kat, kan, kor, kur, lao, lit, lav, mal, mar, msa, nld, nor, ori, pan, pol, por, ron, rus, slk, slv, sqi, srp, swe, tam, tel, tha, tgl, tur, ukr, urd, vie, yor, zho


Donations and suggestions

If you wish to donate for open source improvements, hire me for private modifications, request alternative dataset training, or contact me, please use the following link: https://linktr.ee/nitotm


All versions of efficient-language-detector with dependencies

PHP Build Version
Package Version
Requires php Version ^7.4 || ^8.0
ext-mbstring Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package nitotm/efficient-language-detector contains the following files

Loading the files please wait ....