Download the PHP package rkr/data-diff without Composer
On this page you can find all versions of the php package rkr/data-diff. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download rkr/data-diff
More information about rkr/data-diff
Files in rkr/data-diff
Package data-diff
Short Description A handy tool for comparing structured data quickly in a key-value manner
License MIT
Informations about the package data-diff
data-diff
A handy tool for comparing structured data quickly in a key-value manner
composer
Support for PHPStan
Add the following to your phpstan.neon
file:
WTF
This component is useful if you have a large amount of structured data to import into a local database and you want to identify changes without overwriting everything on each run. Instead, you can determine what has actually changed and take appropriate actions.
Usage
Initially, you have two two-dimensional data lists that you want to compare. Typically, some columns in such a data list indicate the actual differences in terms of new and missing rows. Other columns may indicate changes in existing rows. Additionally, some columns may not trigger any actions but their data could be necessary for subsequent processing.
For example, consider having some article metadata from an external data source that you would like to import into a local database. The external data should be imported into the local database, and you want to take action whenever a dataset is added, removed, or changed (e.g., logging).
External Data:
Local data:
Each list contains three data rows. Both lists have a row that is not present in the other list, and the only common rows (A Hairdryer;C0001
and A Pencil;D0001
) exhibit differences in the price
and stock
columns, while the name
column remains identical. The current-datetime
column should not be compared, but it should be present in case of an insertion or update. The primary objective is to synchronize all changes from the external data source to the local database. Although it might be important to track changes in the current-datetime
column while other columns remain unchanged, this example demonstrates how to handle a scenario where this is not a priority.
The comparison result is derived by comparing two distinct key-value lists. The comparison involves three methods to identify added keys, missing keys, and changed data where keys are equal. To achieve this, it is essential to determine whether a particular row was added, removed, or changed. This task can be complex and depends on the specific data. In this example, certain rules are established, which may vary in different scenarios.
In this example, only the reference
column is used to determine if a row is new or has been removed. For instance, the local database contains a reference to an article A0001
that is not present in the external data, necessitating its removal from the local data. Conversely, B0001
is absent in the local data and should be added. The Hairdryer has a different stock, and the Pencil has a slightly different price. Since prices are stored locally with a decimal precision of two, the two pencil prices are considered equal, and the comparison should not report a change for the row D0001
.
First, it is necessary to define what constitutes a key and a value for the Storage
to understand the key-value list schema. The data is already in the correct format, so no transformation is required.
So, let's give some meaning to the columns:
- The
reference
column indicates whether a particular row is present or not. This serves as the unique identifier for each row. A row may have more than one identifier column (such asreference
andenvironment-id
), but in this case, there is only one identifier. - The
name
column should only be considered when a row is already present in the other list. - The
price
column should only be considered when a row is already present in the other list. - The
stock
column should only be considered when a row is already present in the other list. - The
last-change
column should not be checked at all.
Therefore, when constructing a key-value array for comparison, the key part is composed of the reference
column, and the value part is represented by the name
, price
, and stock
columns.
The key-value array of the first list would then appear as follows:
The key-value-array of the second-list would look like this:
Now, let's compare those arrays in three distinct ways:
What rows are present in the first list, but not in the second:
What rows are present in the second list, but not in the first:
What rows are present in the first list, but have changed values compared to the second list?
You now have all the necessary information to identify the differences between the two lists.
Consider a special case: the pencil has a price of 2.9499
in the first list. However, since we only compare prices with a decimal precision of two, the prices are effectively identical, as the computed price for D0001
is 2.95
in both cases. This is where the Schema
component becomes relevant.
When defining a MemoryDiffStorage
, you specify two schemas: one for the key part and one for the value part:
A MemoryDiffStorage
consists of two stores: StoreA
and StoreB
. You can insert as many rows with as many columns into each store as you want, provided the rows contain at least the columns defined in the schema. The columns must have appropriate names since these names are not translated automatically. However, you can specify a translation when adding rows using the second parameter of addRow
and addRows
. This means that if your columns have different names in the database and the other source, you must normalize those keys before inserting the data into each store.
Here is a example:
A good rule of thumb is to use store a
for the data, you already have and to use store b
for the data to compare to (e.g. the data to import from an external data-source).
Next, we can query one of the stores to find differences in the lists. Since store a
holds our local data, we use store b
to query the differences:
Get all data-sets that are present in store b
but not in store a
:
The result is This row is not present in store b: B0001
.
Get all data-sets that are present in store a
but not in store b
:
The result is This row is not present in store a: A0001
.
Get all changed data-sets:
The result is This row is not present in store a: stock: 12 -> 66, last-change: -> 2016-04-01T10:00:00+02:00
.
Note that D0001
is absent from the result set. This is because the schema has normalized the decimal precision of the price
column, resulting in no detected differences.
Additionally, you can access the data divided into keys and values as defined in each schema. This is useful for constructing SQL statements, where keys can be used as WHERE
conditions in an UPDATE
statement, and values can represent the data to be changed (SET
).
Example
Output:
All versions of data-diff with dependencies
ext-pdo Version *
ext-pdo_sqlite Version *
ext-json Version *
ext-mbstring Version *