Download the PHP package k-samuel/faceted-search without Composer
On this page you can find all versions of the php package k-samuel/faceted-search. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download k-samuel/faceted-search
More information about k-samuel/faceted-search
Files in k-samuel/faceted-search
Package faceted-search
Short Description PHP Faceted search
License MIT
Homepage https://github.com/k-samuel/faceted-search
Informations about the package faceted-search
PHP Faceted search library 3.x
Simplified and fast faceted search without using any additional servers such as ElasticSearch, etc.
It can easily process up to 500,000 items with 10 properties. Create individual indices for product groups or categories and you won't need to scale or use more complex tools for a long time. The software is more effective when operates together with Roadrunner, Swoole, etc.
In addition to faceted filters, it supports exclusive filters. The software is optimized for uncompromising performance.
2.x version
Install
composer require k-samuel/faceted-search
Aggregates
The main advantage of the library is the fast and simplified building of aggregates.
Aggregates in Simple Terms
Imagine that a user has chosen several filters in the interface. We need to update the interface so that only filters compliant with the user’s choice (overlapping product properties) are represented in a list of available filters. We also have to display a number of available products hidden behind each filter.
Every time a user selects a new parameter, аt the code level, determine how many options are available based on the user's selection and display a new list of filters in the interface
This is simple enough, even if products have different structure of properties.
Notes
We recommend to create an individual index for each product category or type and include in such index only fields of concern.
Use your database to store frequently changing fields, such as price, quantity, etc. The faceted search should be used for preliminary data filtering.
Try to reduce the number of records processed. Try to index only products that, for example, are held in stock to exclude processing data on unavailable products.
Performance tests
Testing on a set of products with ten attributes, searching with filters on three fields.
v3.1.0 Bench PHP 8.2.10 + JIT + opcache (no xdebug extension)
ArrayIndex
Items count | Memory | Query | Aggregate | Aggregate & Count | Sort by field | Results Found |
---|---|---|---|---|---|---|
10,000 | ~3Mb | ~0.0003 s. | ~0.0006 s. | ~0.001 s. | ~0.0002 s. | 907 |
50,000 | ~20Mb | ~0.001 s. | ~0.002 s. | ~0.006 s. | ~0.0005 s. | 4550 |
100,000 | ~40Mb | ~0.002 s. | ~0.006 s. | ~0.013 s. | ~0.001 s. | 8817 |
300,000 | ~95Mb | ~0.006 s. | ~0.016 s. | ~0.036 s | ~0.002 s. | 26891 |
1,000,000 | ~329Mb | ~0.024 s. | ~0.053 s. | ~0.134 s. | ~0.009 s. | 90520 |
1,000,000 UB | ~324Mb | ~0.046 s. | ~0.078 s. | ~0.159 s. | ~0.015 s. | 179856 |
FixedArrayIndex
Items count | Memory | Query | Aggregate | Aggregate & Count | Sort by field | Results Found |
---|---|---|---|---|---|---|
10,000 | ~2Mb | ~0.0005 s. | ~0.0009 s. | ~0.002 s. | ~0.0003 s. | 907 |
50,000 | ~12Mb | ~0.002 s. | ~0.003 s. | ~0.012 s. | ~0.0007 s. | 4550 |
100,000 | ~23Mb | ~0.006 s. | ~0.010 s. | ~0.029 s. | ~0.001 s. | 8817 |
300,000 | ~70Mb | ~0.012 s. | ~0.022 s. | ~0.072 s. | ~0.003 s. | 26891 |
1,000,000 | ~233Mb | ~0.045 s. | ~0.070 s. | ~0.257 s. | ~0.012 s. | 90520 |
1,000,000 UB | ~233Mb | ~0.068 s. | ~0.101 s. | ~0.290 s. | ~0.017 s. | 179856 |
(Apple M2 macOS 14.0)
- Items count - Number of products included in an index
- Memory - RAM used by an index
- Query - Time taken to generate a filtered product list
- Aggregate - Generation of a list of available filter values for products found. A list of common properties and their values for products found (Aggregates)
- Aggregate & Count - Generation of a list of available filter values for products found. A list of common properties and their values for products found and counting of products corresponding to each filter (Aggregates)
- Sort by field - Time taken to sort results by one of the fields.
- Results Found - The number of products founds
- UB - Unbalanced dataset (uneven distribution of values in fields)
Benchmark of a library experimental port at Golang https://github.com/k-samuel/go-faceted-search
Bench v0.3.3 go 1.21.1 darwin/arm64 with parallel aggregates.
Items count | Memory | Query | Aggregate & Count | Sort by field | Results Found |
---|---|---|---|---|---|
10,000 | ~7Mb | ~0.0002 s. | ~0.002 s. | ~0.0001 s. | 907 |
50,000 | ~14Mb | ~0.001 s. | ~0.017 s. | ~0.001 s. | 4550 |
100,000 | ~21Mb | ~0.002 s. | ~0.037 s. | ~0.001 s. | 8817 |
300,000 | ~47Mb | ~0.008 s. | ~0.107 s. | ~0.004 s. | 26891 |
1,000,000 | ~140Mb | ~0.031 s. | ~0.363 s. | ~0.015 s. | 90520 |
1,000,000 UB | ~138Mb | ~0.059 s. | ~0.899 s. | ~0.028 s. | 179856 |
(Apple M2 macOS 14.0)
The internal structure of index arrangement in versions on PHP and Golang will be different starting from experimental port ver. 0.0.3 due to peculiarities of the Hash Map internal structure in these languages. In Go, we had to stop using Hash Map to make data storage in slices more effective, which initially allowed us to match PHP version performance.
In PHP, array (hashMap) is more effective for the current task due to using DoubleLinkedList and HashMap key packing.
Go has more effective methods of reduction of the size of slices without copying data (used for list deduplication). This allows to find overlapping using sorted slices.
Further comparison makes little sense because of different algorithms.
Examples
Create an index using console/crontab etc.
Using in application
Indexers
If there are too many values for a certain field in your data, you may use Range Indexer to accelerate RangeFilter operation. For example, searching by price ranges of products. Prices can be divided into intervals with a required increment.
Please, remember that RangeFilter is a rather slow solution, and it’s better to avoid facets with high value variability.
Sorting values inside the range is only possible during the process of index creation, since this aspect is lost in case with real value. Thus, when using RangeIndexer you shouldn’t add individual values to a ready index. As a way to solve this problem, library adds new values to the end of the range and sorts them only between themselves (sorts new values and adds them to the end).
This specific feature makes sense only when you use results sorting by the field that is a range indexed using RangeIndexer.
RangeListIndexer allows creating your own ranges without using an increment as in case with RangeIndexer
Also, you can create your own indexers with range detection method
FixedArrayIndex
FixedArrayIndex is slower than ArrayIndex, but it uses much less RAM. FixedArrayIndex data are compatible with ArrayIndex.
Filter. Self-Filtration Features
When building aggregates, self-filtering of properties is disabled. This allows the user selecting a different value of the same field for filtering (switch the selection) with filter by a certain value of such field being on.
Example. The user wants to find a phone with 32Gb RAM, ticks this checkbox from the provided list (16, 32, 64). If self-filtering is on, then other options will disappear from the user interface. Only 32 Gb value will remain as it will be filtered on the basis of the user’s choice. In this case the user won’t be able to change his/her choice to 64 Gb or 16 Gb.
When building aggregates, field values are used to limit the list of available options of other fields. For example: Filter by “size” field value is used to limit the list of “brand” field results.
Everything depends on your library use scenario. Library was initially designed to simplify the user UI building. If you use library for technical analysis or statistics, enabling self-filtering will help you get expected results.
For all filters:
For individual filter:
More Examples
- Demo
- Performance test
- Bench
Tested but discarded concepts
Bitmap
- (+) Significantly less RAM consumption
- (+) Comparable search speed with filtering
- (-) Limited by property variability, may not fit into the selected bit mask
- (-) Longer index creation
- (-) The need to rebuild the index after each addition of an element or at the final stage of formation
- (-) Slow aggregates creation
Bloom Filter
- (+) Very low memory consumption
- (-) Possibility of false positives
- (-) Slow index creation
- (-) Longer search in the list of products
- (-) Very slow aggregates
Creating multiple indexes for different operations
- (-) Increased consumption of RAM