Download the PHP package rkr/data-diff without Composer

On this page you can find all versions of the php package rkr/data-diff. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package data-diff

data-diff

Scrutinizer Code Quality Build Status Latest Stable Version License

A handy tool for comparing structured data quickly in a key-value manner

composer

See here

WTF

This component is interesting for you, if you have a lot of structured data to import into a local database (for example) and you don't want to overwrite everything on each run. Instead, you want to know, what actually has changed and act accordingly.

Usage

In the beginning, you have two two-dimensional data-lists you want to compare. Normally, some of the columns of such a datalist are subjected to tell, what the actual difference in terms of new and missing rows is. And some columns tell, that changes in existing rows have been found. You could also have columns, that would not cause any action, but their data could be needed in the subsequent processing.

Let's say, you have some article meta-data that could be taken from an external data source and you would like to have that data in a local database. The external data should be imported into that local database and you want to act on, whenever a dataset was added, removed or changed (e.g. logging):

External Data:

Local data:

In each list, we have three data-rows. But in both lists you have a row, that is not available in the other list and the only common rows (A Hairdryer;C0001 and A Pencil;D0001) have some differences in columns price and stock while the name is equal in both lists. Whatever is in the column current-datetime should not be compared, but in case of an insertion or an update it should be considered as well. The final goal is to bring all the changes from the external data-source to the local database. It could be important to know that a current-datetime has changed while all other columns remain unchanged, but in this case I want to show how to handle a case, were this is not important.

An actual compare-result is computed comparing two distinct key-value lists. A comparison is made through three methods that could find added keys, missing keys and changed data where keys are equal. So, in order to get this information, you need to get an idea of how to say, that a particular row was added, removed or changed. This is not always a clear task and is subject to the data in question. In this example, I will set some rules those could be different in your scenario.

In this example, we will only consider the reference to tell if a row is new in a list, or has been removed. So, the local database has a reference to a article A0001 that is not included in the external data. Because of that, we want to remove A0001 from our local data because of this. B0001 is not present in our local data, so it should be added. The Hairdryer has a different stock and the Pencil has a slightly different price. Since, we locally store our prices with a decimal precision of two, the two pencil-prices are actually equal and the comparison should not report a change to the row D0001.

You first need to tell the Storage what exactly is a key and what is a value to define the schema of what the Storage should understand as a key-value-list. We don't want to transform the list, since the data is already fine.

So, let's give some meaning to the columns:

So when we build a key-value-array to make the actual comparison, the key-part is made of the reference and the value-part is represented by the columns name, price and stock.

The key-value-array of the first list would then look like this:

The key-value-array of the second-list would look like this:

Now, let's compare those arrays in three distinct ways:

What rows are present in the first list, but not in the second:

What rows are present in the second list, but not in the first:

What rows are present in the first list, but have changed values compared to the second list?

You have all information to match all differences between the two lists.

We have a special case here. The pencil has a price of 2.9499 in the first list. But since we only want to compare the price with a decimal precision of two, the prices are actually identical, because the computed price of D0001 is in both cases 2.95. This is where the Schema is this component comes in place.

When you define a MemoryDiffStorage you specify two schemas. One for the key-part and one for the value-part:

A MemoryDiffStorage consists of two storages: StoreA and StoreB. You can insert as many rows with as many columns into each store as you want as long as the rows contain at least the columns defined in the schema. The columns also need to have appropriate names since these names are not translated automatically. Although, you can specify a translation when adding rows using the second parameter of addRow and addRows. This means, if your columns have different names in the database and the other source, you have to normalize those keys, before you put the data into each Store.

Here is a example:

A good rule of thumb is to use store a for the data, you already have and to use store b for the data to compare to (e.g. the data to import from an external data-source).

Next, we can query one of the stores to find differences in the lists. Since store a holds our local data, we use store b to query the differences:

Get all data-sets that are present in store b but not in store a:

The result is This row is not present in store b: B0001.

Get all data-sets that are present in store a but not in store b:

The result is This row is not present in store a: A0001.

Get all changed data-sets:

The result is This row is not present in store a: stock: 12 -> 66, last-change: -> 2016-04-01T10:00:00+02:00.

As you may notice, D0001 is not present in the result-set. This is because the schema already normalized the decimal-precision of the column price so, that there did not occur any differences.

You can also access the data divided in keys and values as defined in each schema. This is helpful if you want to build SQL-Statements from the schema. You can treat the keys as the WHERE-Conditions in an UPDATE-Sql-Statement and the Values as the actual data to change (SET):

Example

Output:


All versions of data-diff with dependencies

PHP Build Version
Package Version
Requires php Version >= 7.1
ext-pdo Version *
ext-pdo_sqlite Version *
ext-json Version *
ext-mbstring Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package rkr/data-diff contains the following files

Loading the files please wait ....