Download the PHP package imtigger/unicode-filter without Composer
On this page you can find all versions of the php package imtigger/unicode-filter. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package unicode-filter
Unicode Filter
PHP Unicode string filter library based on Unicode Blocks defined by Unicode 11.0 standard
Usage
Basic Usage
UnicodeFilter::whitelist($input, $filters = [], $excepts = [], $replacement = '')
-
Keep only characters in BASIC_LATIN block
- Keep only characters in BASICLATIN block and replace everything else with underscore ""
UnicodeFilter::blacklist($input, $filters = [], $excepts = [], $replacement = '')
-
Remove only characters in EMOTICONS block
- Return
true
/false
if string is processed
UnicodeFilter::isWhitelistProcessed($input, $filters = [], $excepts = [])
UnicodeFilter::isBlacklistProcessed($input, $filters = [], $excepts = [])
$filter
and$excepts
can accept array of following formats- Block Name (e.g. UnicodeFilter::BASIC_LATIN)
- Arbitary decimal codepoint (e.g. 0x200b, mb_ord("好"))
- Arbitary decimal codepoint range (e.g. [0x2000, 0x200F])
Advanced Usage
-
Keep only characters in BASICLATIN block but excepted range U+00..U+20, replace everything else with underscore ""
-
Keep only (most) characters in English, Chinese, Japanese and Korean
-
Keep only (most) characters in English, Chinese, Japanese, Korean, Thai and also General Punctuation and an additional 😃 character but excepted characters in range U+2000..U+200F and U+205F..U+206F (Unprintable characters) and finally replace any other characters with underscore
- Generate array of details (codepoint and block) for each characters of given string
analysis($string)
- Generate detail of how whitelist/blacklist is processed and it's results
whitelistInfo($input, $filters = [], $excepts = [], $replacement = '')
blacklistInfo($input, $filters = [], $excepts = [], $replacement = '')
Debug Functions
- Dump whitelist/blacklist info to console
dumpWhitelistInfo($input, $filters = [], $excepts = [], $replacement = '')
dumpBlacklistInfo($input, $filters = [], $excepts = [], $replacement = '')
dumpString($string)
dumpFilters($filters = [])
Common Issues
-
Whitelist is more preferred way to work with, because there are too many characters (137,374 characters as of Unicode 11.0)
- Some language, especailly Chinese and sorth-east asia language have characters spread over multiple blocks For example there are CJK_COMPATIBILITY, CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A, CJK_UNIFIED_IDEOGRAPHS, CJK_COMPATIBILITY_IDEOGRAPHS... blocks. Therefore multiple tests needed to include all blocks you may actually need
Reference
All versions of unicode-filter with dependencies
ext-mbstring Version *