Download the PHP package squirrelphp/strings without Composer

On this page you can find all versions of the php package squirrelphp/strings. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package strings

Squirrel Strings

Build Status Test Coverage PHPStan Packagist Version PHP Version

Handles common string operations in PHP applications:

Installation

Alternatively squirrelphp/strings-bundle can be installed for easy integration in Symfony projects, as it provides many servides for dependency injection by default and lets you easily register additional functionality based on this library.

Filter a string

Filters a string and returns back a string, using the Squirrel\Strings\StringFilterInterface interface:

Each filter does exactly one thing (ideally) so they can be combined depending on how an input needs to be changed / processed.

Additional filters can easily be defined by implementing Squirrel\Strings\StringFilterInterface. Possible ideas for custom filters in applications:

This library has some basic filters which cover a lot of ground and can be used individually or combined. Each filter class ends with a Filter suffix, which is not mentioned in the following list in order to keep the titles and texts more readable.

Newlines, tabs and spaces

Normalizing newlines and removing excess spaces is helpful to not waste space before storing data and to always end up with consistent data, especially if you want to transform spaces and newlines to HTML or some other format later.

NormalizeNewlinesToUnixStyle

Replaces all types of newlines (in unicode there are currently 8 different newline characters) with unix newlines (\n) so you will not have a mixture of different newlines with unexpected results.

ReplaceUnicodeWhitespaces

Replaces all unicode whitespace characters (currently 16 different ones) with a regular space if you do not care about the minute differences between the spaces (like width, or non-breaking, or mathematical). For most input this makes sense in order to be able to trim and limit unnecessary spaces.

RemoveExcessSpaces

Removes any unnecessary spaces, which are:

Just operates on regular spaces (unicode 0020, decimal 32) and ignores other unicode whitespaces.

LimitConsecutiveUnixNewlines

Limits the number of unix newlines which can appear right after each other. This only handles unix newlines and does not factor in spaces between newlines, so running the above three filters first makes sense.

The first argument of this filter is the number of consecutive newlines allowed, and it defaults to two, but can be set to any number above zero.

RemoveZeroWidthSpaces

Zero width spaces are usually not something you want to save in a database, so this filter removes the three main zero width spaces defined in unicode.

ReplaceNewlinesWithSpaces

If you do not need any newlines in a text or they are not allowed, this filter replaces any type of newline (in unicode there are currently 8 different newline characters) with a space (to avoid the risk of combining content which is only separated by a newline). Internally this uses the NormalizeNewlinesToUnixStyle filter first and then replaces the unix style newlines with spaces.

Trim

Trims characters from the beginning and end of the string, using the PHP trim function if the characters given to the constructor are only ASCII, or using regex if unicode characters are trimmed.

By default (if no constructor argument is used) the same characters are trimmed as the PHP trim function trims by default, meaning: " \t\n\r\0\x0B" (ordinary space, horizontal tab, new line, carriage return, NUL byte and vertical tab)

ReplaceTabsWithSpaces

Replaces all horizontal tabs with spaces. This is the only filter that deals with horizontal tabs, as tabs might have a different/specific meaning compared to the other unicode spaces.

WrapLongWordsNoHTML

Long sequences of characters without a breaking character (like a space or newline) can break layouts and be difficult to display, and it can easily occur in user input or even by accident in regular content you write yourself.

This filter adds a zero-width space after a certain amount of characters (20 by default) in which no unix newlines or regular spaces occur. So if there is ample space even long words are not broken up, but if the space is tight the long word is split up into multiple lines.

This variant of the filter assumes no HTML is allowed, so it may break up any long sequence of characters.

WrapLongWordsWithHTML

Long sequences of characters without a breaking character (like a space or newline) can break layouts and be difficult to display, and it can easily occur in user input or even by accident in regular content you write yourself.

This filter adds a zero-width space after a certain amount of characters (20 by default) in which no unix newlines or regular spaces occur. So if there is ample space even long words are not broken up, but if the space is tight the long word is split up into multiple lines.

This variant of the filter looks for HTML tags in the string. If there are none, it behaves like WrapLongWordsNoHTML, if HTML tags do occur, each one is temporarily replaced by a substitute character and only counts as one character, and will not be broken up by the filter. So words might be split up "too early" when many HTML tags occur, as HTML tags count as one character for wrapping.

Cases: lowercase, uppercase, camelcase, snakecase

Lowercase

Converts all unicode characters to their lowercase equivalent.

Uppercase

Converts all unicode characters to their uppercase equivalent.

UppercaseFirstCharacter

Convert the first character in the string to uppercase, correctly handles a unicode character as first character.

UppercaseWordsFirstCharacter

Convert the first character of every word in the string to uppercase, correctly handles unicode characters.

CamelCaseToSnakeCase

Convert from CamelCase to snake_case. Only supports alphanumeric characters (A-Z, a-z, 0-9), ignores all others!

SnakeCaseToCamelCase

Convert from snake_case to CamelCase. Only supports alphanumeric characters (A-Z, a-z, 0-9), ignores all others!

HTML

RemoveHTMLTags

Removes all HTML tags. If HTML tags are malformed this might remove more than expected, as it does not try to validate the HTML, it just removes anything that looks like a HTML tag.

RemoveHTMLTagCharacters

Removes the three main characters used in HTML tags: < , > and "

ReplaceUnixStyleNewlinesWithParagraphs

For HTML you often want to process newlines in a predictable way, this filter is one possibility:

For simple content without block level HTML tags this is often ideal to structure text and show it on a HTML page.

EncodeBasicHTMLEntities

Encode &"'<> into their HTML entities (&amp;, &quot;, &apos;, &lt;, &gt;), which is mainly helpful for correctly and securely displaying text in a HTML context.

DecodeBasicHTMLEntities

Does the reverse of EncodeBasicHTMLEntities, sensible if you know input might contain HTML entities and you want to streamline the text and avoid something like &amp;amp;.

DecodeAllHTMLEntities

This decodes all HTML entities according to the HTML5 standard (using html_entity_decode internally). This is usually not necessary but might make sense if you receive text and know it contains a lot of HTML entities and you do not know exactly which or how many.

Remove/restrict characters and content

RemoveNonUTF8Characters

Removes any characters which are not valid according to the UTF8 specification. This filter is recommended for anything coming from outside of your application (user input, web services, data import) so you can continue to operate on a valid UTF8 string afterwards. Most string functions or databases will otherwise reject a string if it contains invalid characters.

If invalid UTF8 characters must never appear in your application it might make sense to instead check the encoding in your application and throw an exception in this way:

Yet this might be overkill for generic user input, where you just want to try to work things out even if the input is partly malformed (might be better than to fail completely).

RemoveNonAlphanumeric

Remove any characters which are not letters or numbers, so only A-Z, a-z and 0-9 are allowed. Can be handy for tokens, parts of an URL, an entered code, or other things where you know no other characters are allowed and you just want to ignore anything non-alphanumeric.

RemoveNonAlphabetic

Remove any characters which are not letters, so only A-Z and a-z are allowed. Can be handy for tokens, country or language codes, or other things where you know no other characters are allowed and you just want to ignore them.

RemoveNonNumeric

Remove any characters which are not numbers, so only 0-9 are allowed. Can be handy for tokens, parts of an URL, an entered code, or other things where you know no other characters are allowed and you just want to ignore them.

RemoveNonAsciiAndControlCharacters

Remove any characters which are not letters, numbers and basic ASCII characters, so only A-Z, a-z, 0-9, space and !"#$'()*+,-./:;<=>?@[\]^_`{|}~ are allowed (no newlines, control characters, or unicode - all that is removed).

RemoveEmails

Remove anything that looks like an email, meaning any string part with non-space characters before and after an @ symbol, so this is quite "greedy".

Originally added to be able to analyze texts and detect the language, where email addresses would only confuse a language detection algorithm, so removing anything that looks like an email from a string should lead to "just text" or at least more analyzable text.

RemoveURLs

Remove anything that looks like an URL, meaning any string part that starts with a valid looking scheme, followed by "://", followed by zero or more non-space characters.

Originally added to be able to analyze texts and detect the language, where URLs would only confuse a language detection algorithm, so removing anything that looks like an URL from a string should lead to "just text" or at least more analyzable text.

Normalize to ASCII

Sometimes unicode with its plethora of characters can be a hindrance - for example in these cases:

Beware: This works well for countries and languages with latin characters (like most of Europe, North America, South America, most of Africa, Australia), yet not so well with other scripts, like Cyrillic, Arabic, Hanzi or Greek, to name just a few.

NormalizeLettersToAscii

Reduces most letters to their base latin ASCII character (A-Z, a-z), if it is possible, so é becomes e, Â becomes A, etc. It is very thorough and uses both the Normalizer from the Intl extension and a long list of custom conversions. Some characters are converted to two ASCII characters (like Æ => AE, or ß to ss), so your string might get longer.

NormalizeToAlphanumeric

Runs NormalizeLettersToAscii from above and then removes any non-alphanumeric characters, so:

NormalizeToAlphanumericLowercase

Runs NormalizeToAlphanumeric from above and then converts all characters to lowercase, so:

If you process both the user inputs and the known values in your database in this manner, you can match them and get more matches/results, as spaces, dashes, diacritics etc. are not taken into account.

ReplaceNonAlphanumeric

Sometimes you do not want to remove non-alphanumeric characters but instead replace them with a character, for example for URLs you want to convert L'école d'Humanité to l-ecole-d-humanite.

This is what ReplaceNonAlphanumeric does by default - replace all non-alphanumeric characters with a dash, and if multiple non-alphanumeric characters occur in sequence they are replaced by just one dash.

In the contructor of ReplaceNonAlphanumeric you can set another replacement character instead of a dash - for example a dot, or a slash, depending on your use case.

Streamline input

These filters combine other filters into a sensible package to run on user input, and is more of an example than something you might want to use directly in your application.

You can do your own combination of filters by using the Squirrel\Strings\StringFilterRunner class.

StreamlineInputWithNewlines

Runs the following filters:

This makes sure the string is valid UTF8 and normalizes all whitespace characters and removes unnecessary whitespace characters, while leaving the content itself alone (works with or without HTML, does not convert HTML entities).

StreamlineInputNoNewlines

Runs the following filters:

Basically the same as StreamlineInputWithNewlines but newlines are converted to spaces. This is good for common user input like names, emails addresses and any other fields where newlines make no sense.

Test a string

Tests a string and returns back true or false (whether the test was successful), using the Squirrel\Strings\StringTesterInterface interface:

Additional testers can easily be defined by implementing Squirrel\Strings\StringTesterInterface. Possible ideas for custom testers in applications:

This library has two default testers. Each tester class ends with a Tester suffix, which is not mentioned in the following list in order to keep the titles and texts more readable.

ValidUTF8

Checks that only valid UTF8 characters are contained within a string. If your application wants to be strict about external data it can make sense to reject any data with non-UTF8 characters (a less strict way of dealing with non-UTF8 characters would be the RemoveNonUTF8Characters filter).

ValidDateTime

Checks the string according to a datetime format accepted by the date function given in the constructor of this class (default is Y-m-d for the ISO date format with dashes between year, month and day) and makes sure the given date exists (2021-02-29 would return false, yet 2020-02-29 would return true). When validating input this makes it easy to ensure a date is in the format you expect and can be used for further processing.

Generate a random string

Generates random strings according to a list of possible characters allowed in the string.

With the two included classes (one with unicode support, one for ASCII-only) it is easy to define a random generator with your own set of characters which should be allowed to appear in a random string. These are sensible values:

Defining your own range of possible characters is easy, and even unicode characters can be used.

Condense a string into a number

Convert an integer to a string with a given "character set" - this way we can encode an integer to condense it (so an integer with 8 numbers is now only a 4-character-string) and later convert it back when needed.

The main use case are tokens in URLs, so less space is needed, as even large numbers become short strings if you use 36 or 62 values per character: with 62 possible characters (ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789) a string which is three characters long can cover numbers up to 238'328, with five characters you can cover numbers up to 916'132'832.

A side benefit of condensing is that it becomes less obvious an integer is used - tokens just look random and do not divulge their intent.

Defining your own range of possible characters is easy, and even unicode characters can be used.

URL

The URL class accepts an URL in the constructor and then lets you get or change certain parts of the URL to do the following:

This can be used to easily build or change your URLs, or to sanitize certain parts of a given URL, for example when redirecting: use the relative URL instead of the absolute URL to avoid malicious redirecting to somewhere outside of your control.

Regex wrapper

Using the built-in preg_match, preg_match_all, preg_replace and preg_replace_callback PHP functions often makes code less readable and harder to understand for static analyzers because of its uses of references ($matches) and the many possible return values. Squirrel\Strings\Regex wraps the basic functionality of these preg functions, creates easier to understand return values and throws a Squirrel\Strings\Exception\RegexException if anything goes wrong. These are the available static methods for the Regex class:

Regex::isMatch(string $pattern, string $subject, int $offset): bool

Wraps preg_match to check if $pattern exists in $subject.

Regex::getMatches(string $pattern, string $subject, int $flags, int $offset): ?array

Wraps preg_match_all to retrieve all occurences of $pattern in $subject with PREG_UNMATCHED_AS_NULL flag always set and the possibility to add additional flags. Returns null if no matches are found, otherwise the array of results as set by preg_match_all for $matches.

Regex::replace(string|array $pattern, string|array $replacement, string $subject, int $limit): string

Wraps preg_replace to replace occurences of $pattern with $replacement and only accepts a string as $subject.

Regex::replaceArray(string|array $pattern, string|array $replacement, array $subject, int $limit): array

Wraps preg_replace to replace occurences of $pattern with $replacement and only accepts an array as $subject.

Regex::replaceWithCallback(string|array $pattern, callback $callback, string $subject, int $limit, int $flags): string

Wraps preg_replace_callback to call a callback with the signature function(array $matches): string for each occurence of $pattern in $subject and only accepts a string as $subject.

Regex::replaceArrayWithCallback(string|array $pattern, callback $callback, array $subject, int $limit, int $flags): array

Wraps preg_replace_callback to call a callback with the signature function(array $matches): string for each occurence of $pattern in $subject and only accepts an array as $subject.


All versions of strings with dependencies

PHP Build Version
Package Version
Requires php Version >=8.1
ext-mbstring Version *
squirrelphp/debug Version ^2.0
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package squirrelphp/strings contains the following files

Loading the files please wait ....