Download the PHP package dakujem/toru without Composer
On this page you can find all versions of the php package dakujem/toru. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package toru
Toru 取る
Toru is a standalone tool for iterable collections,
ready for simple day-to-day tasks and advanced optimizations.
Most of its functionality is based on native generators for efficiency with large data sets.
💿
composer require dakujem/toru
📒 Changelog
Toru provides a few common
- iteration primitives (e.g.
map
,filter
,tap
), - aggregates (e.g.
reduce
,search
,count
) - and utility functions (e.g.
chain
,valuesOnly
,slice
,limit
)
... implemented using generators.
TL;DR:
- transform elements of native
iterable
type* collections (iterators and arrays) without converting to arrays - keys (indexes) provided for every mapper, filter, reducer or effect function
- fluent call chaining enabled (Lodash-style)
- lazy per-element evaluation of transformations
- transform large data sets without increasing memory usage
- better memory efficiency of generators compared to native array functions
- slower than native array functions or direct transformations inside a
foreach
block
* The
iterable
is a built-in compile time type alias forarray|Traversable
encompassing all arrays and iterators, so it's not exactly a native type, technically speaking.
Fluent call chaining enables neat transformation composition.
Toru enables memory-efficient operations on large data sets, because it leverages generators, which work on per-element basis and do not allocate extra memory.
All callable parameters always receive keys along with values.
This is a key advantage over native functions like array_map
, array_reduce
, array_walk
, or array_filter
.
🥋
The package name comes from Japanese word "toru" (取る), which may mean "to take", "to pick up" or even "to collect".
Examples
Task: Iterate over multiple large arrays (or other iterable collections) with low memory footprint:
Task: Filter and map a collection, also specifying new keys (reindexing):
Task: Create a list of all files in a directory as path => FileInfo
pairs without risk of running out of memory:
Note that here we use global function _dash
, which you may optionally define in your project.
See the "Using a global alias" section below.
Usage
Most of the primitives described in API section below are implemented in 3 forms:
- as a static method
Itera::*(iterable $input, ...$args)
, for simple cases - as a fluent method of the
Dash
wrapper,Dash::*(...$args): Dash
, best suited for fluent composition - as a factory method that creates partially applied callables
IteraFn::*(...$args): callable
, to be composed into pipelines or used as filters (i.e. in Twig, Blade, Latte, ...)
Example of filtering and mapping a collection, then appending some more already processed elements.
Usage of the individual static methods:
Usage of the Dash
wrapper for fluent call chaining:
Usage of the partially applied methods:
The $processed
collection can now be iterated over.
All the above operations are applied at this point only, on per-element basis.
API
Chaining multiple iterables: chain
, append
The chain
method creates an iterable composed of all the arguments.
The resulting iterable will yield all values (preserving keys) from the first iterable, then the next, then the next, and so on.
Compared to array_replace
(or array_merge
or the union operator +
on arrays) this is very memory efficient,
because it does not double the memory usage.
The append
method appends iterables to the wrapped/input collection. It is an alias of the chain
method.
The append
method is present in IteraFn
and Dash
classes only.
Appending makes no sense in the static context of the Itera
class as there is nothing to append to.
In static context, use Itera::chain
instead.
Mapping: map
, adjust
, apply
, reindex
, unfold
The adjust
method allows to map both values and keys.
The apply
method maps values only,
and the reindex
method allows mapping keys (indexes).
Do not confuse the map
method with the native array_map
, the native function has different interface.
Instead, prefer to use the apply
method to map values.
The map
method is an alias of the apply
method.
For each of these methods, all mapping callables receive the current key as the second argument.
The signature of the mappers is always
The unfold
methods allows mapping and/or flattening matrices one level.
One niche trick to map both values and keys using a single callable with unfold
is to return a single key-value pair (an array containing a single element with a specified key), like so:
Reducing: reduce
This is an aggregate function, it will immediately consume the input.
Similar to array_reduce
, but works with any iterable and passes keys to the reducer.
The reducer signature is
When using the Dash::reduce
fluent call, the result is treated in two different ways:
- when an
iterable
value is returned, the result is wrapped into a newDash
instance to allow to continue the fluent call chain (useful for matrix reductions) - when other
mixed
value type is returned, the result is returned as-is
Filtering: filter
Create a generator that yields only the items of the input collection that the predicate returns truthy for.
Accept and eliminate elements based on a callable predicate.
When the predicate returns truthy, the element is accepted and yielded.
When the predicate returns falsy, the element is rejected and skipped.
The predicate signature is
Similar to array_filter
, iter\filter
.
Sidenote
Native
CallbackFilterIterator
may be used for similar results:
Searching: search
, searchOrFail
, firstValue
, firstKey
, firstValueOrDefault
, firstKeyOrDefault
These are aggregate functions, they will immediately consume the input.
Search for the first element that the predicate returns truthy for.
search
returns the default value if no matching element is found, while searchOrFail
throws.
The firstKey
and firstValue
methods throw when an empty collection is on the input,
while the *OrDefault
variants return the specified default value in such a case.
The predicate signature is
Slicing: slice
, limit
, omit
Limit the number of yielded elements with limit
,
skip certain number of elements from the beginning with omit
,
or use slice
to combine both omit
and limit
into a single call.
Keys will be preserved.
Passing zero or negative value to $limit
yields an empty collection,
passing zero or negative values to $omit
/$offset
yields the full set.
Note that when omitting, the selected number of elements (
$omit
/$offset
) is still iterated over but not yielded.
Similar to array_slice
, preserving the keys.
Note:
Unlikearray_slice
, the keys are always preserved. UseItera::valuesOnly
when dropping the keys is desired.
Alterations: valuesOnly
, keysOnly
, flip
Create a generator that will only yield values, keys, or will flip them.
The flip
function is similar to array_flip
,
the valuesOnly
function is similar to array_values
,
and the keysOnly
function is similar to array_keys
.
Conversions: toArray
, toArrayValues
, toArrayMerge
, toIterator
, ensureTraversable
These functions immediately use the input.
Convert the input to array
/Iterator
from generic iterable
.
💡
Iterators in general, Generators specifically, impose a challenge when being cast to arrays. Read the "Caveats" section below.
There are 3 variants of the "to array" operation.
Toru function | Behaves like | Associative keys | Numeric keys | Values overwritten when keys overlap |
---|---|---|---|---|
toArray |
array_replace |
preserved | preserved | values with any overlapping keys ❗ |
toArrayMerge |
array_merge |
preserved | discarded | only values with associative keys |
toArrayValues |
array_values |
discarded | discarded | no values are overwritten |
Tapping: tap
, each
Create a generator, that will call an effect function for each element upon iteration.
The signature of the effect function is
The return values are discarded.
Repeating: repeat
, loop
, replicate
The repeat
function repeats the input as-is, indefinitely.
The loop
function yields individual elements of the input, indefinitely.
The replicate
function yields individual elements of the input, exactly specified number of times.
Both repeat
and loop
should be wrapped into a limit
and valuesOnly
if cast to arrays.
Please note that if the loop
and replicate
functions have a generator on the input,
they may/will run into the issues native to generators - being non-rewindable and having overlapping indexes.
Producing: make
, produce
The produce
function will create an infinite generator that will call the provided producer function upon each iteration.
It is supposed to be used with the limit
function.
The make
function creates an iterable collection from its arguments.
It is only useful in scenarios, where an iterator (a generator) is needed. Use arrays otherwise.
These two functions are only available as static Itera
methods.
To produce both keys and values, one might use unfold
to wrap the produce
which would return key=>value pairs.
Lazy evaluation
Generator functions are lazy by nature.
Invoking a generator function creates a generator object, but does not execute any code.
The code is executed once an iteration starts (e.g. via foreach
).
By passing a generator as an input to another generator function, that generator is decorated and a new one is returned. This decoration is still lazy and no code execution occurs just yet.
💡
If such an iteration was terminated before the whole collection had been iterated over (e.g. viabreak
), the callables would NOT be called for the remaining elements.
This increases efficiency in cases, where it is unsure how many elements of a collection will actually be consumed.
Every function provided by Toru that returns iterable
uses generators and is lazy.
Examples: adjust
, map
, chain
, filter
, flip
, tap
, slice
, repeat
Other functions, usually returning mixed
or scalar values, are aggregates
and cause immediate iteration and generator code execution, exhausting generators on the input.
Examples: reduce
, count
, search
, toArray
, firstValue
Using keys (indexes)
Callable parameters of all the methods (mapper, predicate, reducer and effect functions) always receive keys along with values.
This is a key advantage over native functions like array_map
, array_reduce
or array_walk
,
even array_filter
in its default setting.
Instead of
it may be more convenient to
With array_reduce
this is even more convoluted, because there is no way to pass the keys to the native function.
One way to deal with it is to transform the array values to include the indexes
and to alter the reducer to account for the changed data type.
Here, the solution may be even more concise
Custom transformations
To support custom transformation without interrupting a fluent call chain when using Dash
,
two methods are provided:
Dash::alter
- expects a decorator function returning an altered iterable collection
- the iterable result is wrapped in a new
Dash
instance to continue the fluent call chain
Dash::aggregate
- expects an aggregate function that returns any value (
mixed
) - terminates the fluent chain
- expects an aggregate function that returns any value (
Dash::alter
wraps the return value of the decorator into a new Dash
instance, allowing for fluent follow-up.
Dash::aggregate
returns any value produced by the callable parameter, without wrapping it into a new Dash
instance.
Missing a "key sum" function? Need to compute the median value?
Extending Toru
Extending the Dash
class may be considered to implement custom transformations or aggregations to use within the fluent call chain.
The Itera
and IteraFn
classes may be extended for consistence with extension to Dash
.
Using a global alias
If you desire a global alias to create a Dash-wrapped collection, such as _dash
,
the best way is to register the global function in your bootstrap like so:
You can also place this function definition inside a file (e.g. /bootstrap/dash.php
) that you automatically load using Composer.
In your composer.json
file, add an autoloader rule as such:
You no longer need to import the Dash
class.
Take care when defining global function
_
or__
as it may interfere with other functions (e.g. Gettext extension) or common i8n function alias.
Caveats
Generators, while being powerful, come with their own caveats:
- working with keys (indexes) may be tricky
- generators are not rewindable
Please understand generators before using Toru, it may help avoid a headache:
📖 Generators overview
📖 Generator syntax
Generators and caveats with keys when casting to array
There are two challenges native to generators when casting to arrays:
- overlapping keys (indexes)
- key types
Overlapping keys cause values to be overwritten when using iterator_to_array
.
And since generators may yield keys of any type, using them as array keys may result in TypeError
exception.
The combination of chain
and toArray
(or iterator_to_array
) behaves like native array_replace
:
The result will be [3, 4]
, which might be unexpected. The reason is that the iterables (arrays in this case) have overlapping keys,
and the later values overwrite the previous ones, when casting to array.
This issue is not present when looping through the iterator:
The above will correctly output
See this code in action: generator key collision
If we are able to discard the keys, then the fastest solution is to use toArrayValues
,
which is a shorthand for the chained call Itera::toArray(Itera::valuesOnly( $input ))
.
If we wanted to emulate the behaviour of array_merge
, Toru provides toArrayMerge
function.
This variant preserves the associative keys while discarding the numeric keys.
That call will produce the following array:
💡
Note that generators may typically yield keys of any type, but when casting to arrays, only values usable as native array keys are permitted, for keys of other value types, a
TypeError
will be thrown.
Generators are not rewindable
Once an iteration of a generator is started, calling rewind
on it will throw an error.
This issue may be overcome using the provided Regenerator
iterator (read below).
Supporting stuff
Regenerator
Regenerator
is a transparent wrapper for callables returning iterators, especially generator objects.
These may be directly generator functions, or callables wrapping them.
Regenerator
enables rewinding of generators, which is not permitted in general.
The generators are not actually rewound, but are created again upon each rewinding.
Since most of the iteration primitives in this library are implemented using generators, this might be handy.
Note: Rewinding happens automatically when iterating using
foreach
.
Let's illustrate it on an example.
This may be solved by calling the generator function repeatedly for each iteration.
In most cases, that will be the solution, but sometimes an iterable
/Traversable
object is needed.
This is where Regenerator
comes into play.
Regenerator
internally calls the provider function whenever needed (i.e. whenever rewound),
while also implementing the Traversable
interface.
Storing intermediate value
Since most calls to Toru functions return generators, storing the intermediate value in a variable suffers the same issue.
Again, the solution might be to create a function, like this:
Alternatively, the Regenerator
class comes handy.
Pipeline
A simple processing pipeline implementation.
Useful with IteraFn
class to compose processing algorithms.
Why iterables
Why bother with iterators when PHP has an extensive support for arrays?
There are many cases where iterators may be more efficient than arrays.
Usually when dealing with large (or possibly even infinite) collections.
The use-case scenario for iterators is comparable to stream resources. You will know that stuffing uploaded files into a string variable is not the best idea all the time. It will surely work with small files, try that with 4K video, though.
A good example might be a directory iterator.
How many files might there be? Dozens? Millions? Stuffing that into an array may soon drain the memory reserves of your application.
So why use the iterable
type hint instead of array
?
Simply to extend the possible use-cases of a function/method, where possible.
Memory efficiency
The efficiency of generators stems from the fact that no extra memory needs to be allocated when doing stuff like chaining multiple collections, filtering, mapping and so on.
On the other hand, a foreach
block will always execute faster, because there are no extra function calls involved.
Depending on your use case, the performance difference may be negligible, though.
However, in cloud environments, memory may be expensive. It is a tradeoff.
In real-world scenarios, with OpCache enabled, using Toru would decrease memory usage with minimal/negligible impact on execution time.
For example, chaining multiple collections into one instead of using
array_merge
will be more efficient.https://3v4l.org/Ymksm
https://3v4l.org/OmUb3
https://3v4l.org/HMasjAlso use comparison scripts in
/tests/performance
against your actual environment, if you are concerned about performance.
Alternatives
You might not need this library.
mpetrovich/dash
provides a full range of transformation functions, uses arrays internallylodash-php/lodash-php
imitates Lodash and provides a full range of utilities, uses arrays internallynikic/iter
implements a range of iteration primitives using generators, authored by a PHP core team memberilluminate/collections
should cover the needs of most Laravel developers, provides both array-based and generator-based implementations- in many cases, a
foreach
will do the job
Toru library (dakujem/toru
) does not provide a full range of ready-made transformation functions,
rather provides the most common ones and means to bring in and compose own transformations.
It works well with and along with the aforementioned and other such libraries.
Toru originally started as an alternative to nikic/iter
for daily tasks,
which to me has a somewhat cumbersome interface.
The Itera
static class tries to fix that
by using a single class import instead of multiple function imports
and by reordering the parameters so that the input collection is consistently the first one.
Still, composing multiple operations into one transformation felt cumbersome, so the IteraFn
factory was implemented to fix that.
It worked well, but was still kind of verbose for mundane tasks.
To allow concise fluent/chained calls (like with Lodash), the Dash
class was then designed.
With it, it's possible to compose transformations neatly.
Contribution and future development
The intention is not to provide a plethora specific functions, rather offer tools for most used cases.
That being said, good quality PRs will be accepted.
Possible additions may include:
combine
values and keyszip
multiple iterables (python, haskell, etc.)alternate
multiple iterables (load-balance elements, mix)
Appendix: Example code with annotations
Illustration of various approaches
Observe the code below to see foreach
and Dash
solve a simple problem.
See when and why Dash
may be more appropriate than Itera
alone.
Example: Listing images in a directory
Let us solve a simple task: List all images of a directory recursively.
You may have generative AI do this too, or come up with something like this:
This will work in development, but will have a huge impact on your server if you try to list millions of images, something not uncommon for mid-sized content-oriented projects.
The way to fix that is by utilizing a generator:
And what if you could create equivalent generator like this...
It now depends on personal preference. Both will do the trick and be equally efficient.