Download the PHP package nicmart/dgim without Composer

On this page you can find all versions of the php package nicmart/dgim. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package dgim

The Datar-Gionis-Indyk-Motwani algorithm

Build Status Scrutinizer Code Quality

Given a stream of bits and a window size N, we want be able to answer the question

How many 1s appeared in the last k bits?

Of course the answer is very simple if we are able to store in memory all last N bits. If it's not the case, we have to use a smarter way to store and to query the data.

The DGIM algorithm allows us to answer the question with a logarithmic amount of memory, and with tunable precision.

More precisely, for a given precision 1/m, the needed amount of memory is O(m log(N)²).

Just to outline, log(N)² (base 2) is a ridicously low number compared to N, for big N. For example, if N is 80 bilions, log(N)² is 1311.

In this library you can find a PHP implementation of the algorithm, together with the generalization to streams of non negative integers. Yeah, PHP is not the proper tool for memory-intensive tasks, I've written this mainly for experimental purposes.

How does it work?

The main idea is to store "buckets" of bits, with an exponential increasing size. For each bucket we store only the timestamp of its latest one and the number of ones it contains.

For more details you can check section 4.6.2 of the book Mining of Massive Datasets, freely available in PDF format.

Here you can find a sketch that explains the behaviour of the algorithm for a window of 8 bits when asking the number of ones of the last 7 bits. As you can see, when there are more than two buckets of the same size, the two oldest are condensed into a twice sized one. Timestamps are stored modulo N.

To calculate the sum of last K bits, we find the earliest bucket whose timestamp is in the k-window, we take an half of its value, and we sum to it the sizes of all the following buckets.

Counting 1s

The only component the client need to use is the Counter class. This is the way to use it for counting ones with a max 1% error:

Counting the sum of positive integers

If you are dealing with a stream of positive integers instead of a stream of bits, you can use the counter to get the approximate sum of the last $k ints. To do that you need to pass to the Counter the maximum of the integers you will receive in the stream:

The only component the client need to use is the Counter class. This is the way to use it for counting ones with a max 1% error:

Command line example

You can test the precision of the library on randomly generated data using the examples/example.php file:

References

You can find a detailed description of the algorithm in section 4.6.2 of the wonderful book Mining of Massive Datasets, freely available in PDF format.

Technical notes

Each bucket stores the timestamp and the exponent as php integers. The most memory efficient implementation should only stores log(N) bits for the timestamp and log(log(N)) bits for the exponent.

Furthermore, the bucket sequence is implemented as a double linked list, so it is taking space also for the bucket links. An array implementation of the sequence and a language that provides true array objects can avoid this.

Install

The best way to install DGIM is through composer.

Just create a composer.json file for your project:

Then you can run these two commands to install it:

$ curl -s http://getcomposer.org/installer | php
$ php composer.phar install

or simply run composer install if you have have already installed the composer globally.

Then you can include the autoloader, and you will have access to the library classes:


All versions of dgim with dependencies

PHP Build Version
Package Version
Requires php Version >=5.4
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package nicmart/dgim contains the following files

Loading the files please wait ....