Download the PHP package jeffpacks/substractor without Composer
On this page you can find all versions of the php package jeffpacks/substractor. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download jeffpacks/substractor
More information about jeffpacks/substractor
Files in jeffpacks/substractor
Package substractor
Short Description A sub-string extractor for the regex ignorant
License MIT
Homepage https://github.com/jeffpacks/substractor
Informations about the package substractor
About
Substractor is a utility for matching strings against wildcard patterns and for extracting or manipulating substrings according to macro patterns. All pattern formats are easy to understand, suitable for simple string matching, extraction and manipulation tasks without the need for those wonderful and pesky regular expressions.
Dependencies
This library requires at least PHP 7.4.
Installing
Composer
Run composer require jeffpacks/substractor
in your project's root directory.
Downloading
git clone https://github.com/jeffpacks/substractor.git
or download from https://github.com/jeffpacks/substractor/archive/refs/heads/master.zip
.
Add require('/path/to/substractor/autoload.php')
to your PHP script, and you're good to go.
Patterns
Substractor uses only two types of wildcard tokens in its patterns:
*
matches any substring of zero or more characters?
matches any single character
You can then build patterns like these and match them against strings:
http*://example.test
matcheshttp://example.test
andhttps://example.test
https://example.???
matcheshttps://example.com
but nothttps://example.io
The wildcard tokens are used for matching, but can also be used for extracting substrings.
A key feature of Substractor's extraction algorithms is that they are word bound. They will look for substrings in "words" (substrings delimited by whitespace characters), rather than across the entire string, giving you foobar
and not foobar gone?
when looking for foo*
in Where has the foobar gone?
Macros
A macro is a named substring or a named set of substrings. By including macro tokens in your patterns, you can extract named substrings. A macro named foo
looks like this in a pattern: {foo}
. Extracting the protocol, domain and top-level domain of a URL would require a pattern with macro tokens like this: {protocol}://{domain}.{top}*
The methods
Substractor::matches()
This method indicates whether a given string matches a pattern that you specify. Here' an example:
Substractor::subs()
We can extract all substrings that fully match a given pattern with this method. Say you want to extract all e-mail addresses from a string. It'll be easy – like breaking a toothpick:
Substractor::macros()
This method lets you extract named substrings. If you want to parse a route URL, like in Laravel, you would do it like so:
You can of course specify multiple patterns:
There may be cases where you want to use one macro pattern over another based on the general pattern of the string you're searching. We achieve this by keying each macro pattern with a "key pattern" which the entire string must match in order for the macro pattern to be used. Here's an example:
Note that this method will only return one substring per macro token. If you have a string with multiple matching substrings, only the first matches will be returned. Example:
If you need to extract all matching substrings, use the Substractor::macrosAll()
method.
Substractor::macrosAll()
This method will return named sets of substrings, as opposed to Substractor::macros()
which only returns single named substrings. Here's an example:
Substractor::pluck()
If you only want the first matching substring, this method is for your convenience. It will call Substractor::macros()
for you and return the substring of a macro you specify. Example:
Substractor::pluckAll()
If you want to pluck all substrings matching a given macro token, use this method. It will call Substractor::macrosAll()
for you and return the substrings of a macro you specify. Example:
Substractor::replace()
If you want to replace or manipulate a specific substring within some other string, you can use the Substractor::replace()
method. By specifying a macro pattern, you can tell it which of those macros you want to replace or manipulate using method chaining. Here's an example where we encrypt and decrypt the password segment of a URL:
The object returned from Substrator::replace()
have (magic) methods that correspond to the macro tokens you specify in your macro pattern. This way, you can specifically target the substrings you want to replace, even if some of them are identical. Each method will return the same object, so you can chain your calls:
Redaction
Sometimes characters get in the way of what you're trying to match and/or extract. Consider a Markdown document containing links, such as this one: You can [e-mail me](mailto:[email protected]) or reach me on [Github](https://github.com/jeffpacks)
. Attempting to extract the substring e-mail me
using the pattern [*]
would yield no result because the single *
wildcard tries to match a single word, but there are two words between these brackets. This is a case where we can use the redaction
parameter that all the Substractor methods accept. There are 3 types of redactions:
- Pre-redaction: A given character/string will be removed before the matching takes place, but is left intact in the returned substrings
- Post-redaction: A given character/string is left untouched prior to matching, but is removed from the returned substrings
- Full redaction: A given character/string is removed prior to matching and is also removed from the returned substrings (a combination of pre- and post-redaction).
We can use pre-redaction to remove the space character between the two words prior to matching, but allow it to remain untouched in the returned substring:
The above will give the substring [e-mail me]
, including the brackets. If we don't want the brackets, we will have to do a post-redaction on those:
This gives us the e-mail me
substring only, as the brackets have been post-redacted.
Full redaction is a less common use-case, but is indicated by specifying a boolean true
as the value of the redaction array.
Authors
- Johan Fredrik Varen
License
MIT License