Download the PHP package magnusjt/mongotd without Composer
On this page you can find all versions of the php package magnusjt/mongotd. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download magnusjt/mongotd
More information about magnusjt/mongotd
Files in magnusjt/mongotd
Package mongotd
Short Description Mongodb timeseries database
License MIT
Informations about the package mongotd
mongotd
A PHP implementation of a timeseries database with mongodb. It takes care of storing the timeseries as well as doing anomaly detection on them using different techniques (currently with either 3-sigma, Holt-winters, or Kolmogorrow-Smirnov).
The time series are stored using a sid (a sensor ID, i.e. what does the data represent) and a nid (a node ID, i.e. where the data is collected from). Together these IDs uniquely represent a timeseries. The time series may of type gauge, or incremental. In the latter case, contiguous values are subtracted from each other in order to get the rate of change.
- Examples of gauge counters are temperature, current number of visitors, etc.
- Examples of incremental counters are cumulative number of visitors, cumulative number of requests, etc.
In order to be efficient with storage, the data is stored in one-day chunks, using a hash with the key being the number of seconds since midnight. This avoids storing metadata for each and every data point.
On retrieval of the data, aggregation can be performed both in time (sum/avg/max/min over an hour, day, etc.), space (sum/avg/max/min over certain nodes), or by a given formula. Do note, however, that pre-aggregation is not done. This is to avoid the timezone problem (i.e. a certain day may contain different hours, depending on which time zone the data is viewed in).
Installation
Install with composer:
`
Basic usage
First we grab a mongodb connection and initialize mongotd. ``
About sensor ids and node ids
Sensor ids and node ids are stored as strings in the database. Therefore, any supplied ids must be castable to string.
About inserted values
Inserted values are always either integer or doubles, and must be castable to those data types.
Inserting data
Inserting data is done by using a set of storage middleware. You can pick and choose what middleware you want, or even create your own. In the extreme case, it is actually possible to replace mongodb with another database.
For example: `
Storing time series data is done via an array of CounterValue:
`
Retrieving data
Because there are so many ways of massaging time series data, we use a pipeline solution for processing the raw values. The pipeline can be assembled in many different ways, and it is possible to add new or modified processing steps according to your needs.
Some examples of what you can do with the default pipeline:
Roll up to daily values by sum, average, max, min:
`
Combine different series using sum, average, max, min:
`
In this example you can see that we first roll up each series, then combine them into one series, and finally roll that result up to the daily series.
In this case we just sum everything, so the order of operations isn't that big a deal. However, if you're using aggregations such as average/max/min, the order does matter. By using the pipeline, you have full control of the order of operations.
E.g.:
Combine different series using a formula
Example: `
Pipeline factory
Some default pipeline sequences can be created using the Mongotd\Pipeline\Factory class. Of course, it's possible to create custom factories as well.
What about timezones?
Timezones are a huge pain when dealing with time series. It's not enough to figure out the timezone offset for the time series, since daylight savings time may cause the offset to change during the duration of the series. It is possible to use some datetime library to get rid of some of this pain, but the problem then becomes performance. So in this project we use timestamps for as many things as possible, and only take timezones into account when we absolutely have to. The most common place where we need to think about timezones is when rolling up in time. Here we offset each timestamp in the calculation such that when the result is converted into a datetime in the given timezone, the values will fall exactly at the desired resolution steps.
Finding anomalous data
Anomalies are found during the insertion of new data. To achieve this, we need to decide on an anomaly test to use.
Example using the 3-sigma method:
`
The 3-sigma method takes the average and standard deviation of all values that fall on the same time of day as the current time. It compares this average to the current value, using the 3*STD as the allowed deviation from the average.
The downside to the 3-sigma method is that it assumes the values are normally distributed (which is not always the case). In this case, it also assumes that values follow a daily profile where values are likely to be the same at the same time of the day. And finally, the method needs to go far back in time and retrieve values, which may be slow.
Another method for finding anomalies is the Holt-Winters algorithm. It has the same downside as 3-sigma, in that the data should be normally distributed and follow a time profile. However, it uses exponential smoothing instead of exact calculation of average and std. Therefore, it should be much faster.
Example using Holt-Winters:
`
Retrieving anomalies
Using the pipeline, we can also find stored anomalies which were detected during insertion.
In addition, we have a helper sequence for finding the anomaly state in a given period. The state is an array of timestamp => 0|1, rolled up to the desired resolution.
``