Download the PHP package bl4ckbon3/strsim without Composer
On this page you can find all versions of the php package bl4ckbon3/strsim. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download bl4ckbon3/strsim
More information about bl4ckbon3/strsim
Files in bl4ckbon3/strsim
Package strsim
Short Description Collection of string similarity and distance algorithms in PHP including Levenshtein, Damerau-Levenshtein, Jaro-Winkler, and more
License MIT
Homepage https://github.com/Edgaras0x4E/StrSim
Informations about the package strsim
StrSim v1.1.1
A collection of string similarity and distance algorithms implemented in PHP with full Unicode and multibyte character support. This library provides standalone static methods for computing various similarity metrics, useful in natural language processing, fuzzy matching, spell checking, and bioinformatics.
What's New in v1.1.1
🔧 Fixed Naming Issues
- Fixed
Jaro::distance()- Previously returned similarity values (1.0 = identical), now correctly returns distance values (0.0 = identical) - Fixed
JaroWinkler::distance()- Previously returned similarity values (1.0 = identical), now correctly returns distance values (0.0 = identical)
✨ New Functions Added
Jaro::similarity()- Returns proper similarity values (1.0 = identical, 0.0 = completely different)JaroWinkler::similarity()- Returns proper similarity values (1.0 = identical, 0.0 = completely different)
📚 Improvements
- Better MongeElkan - Fixed edge cases for empty string comparisons
🔄 Migration Guide
If you were using Jaro::distance() or JaroWinkler::distance() expecting similarity values (where 1.0 = identical):
- Before:
Jaro::distance("hello", "hello")returned1.0 - After: Use
Jaro::similarity("hello", "hello")to get1.0, orJaro::distance("hello", "hello")returns0.0
Requirements
- PHP 8.3+
- Composer
Installation
-
Use the library via Composer:
- Include the Composer autoloader:
Features
- Full Unicode Support: All algorithms handle multibyte characters, emoji, combining marks, and complex grapheme clusters
- UTF-8 Validation: Automatic validation of input strings with clear error messages
- Error Handling: Proper exception types with descriptive messages
- Code-Point Based: Consistent behavior across all Unicode normalization forms
- Optimized Tokenization: Smart whitespace handling for text-based algorithms
- Distance vs Similarity: Clear distinction between distance measures (0 = identical) and similarity measures (1 = identical)
Supported Algorithms
| Class | Method | Return Range | Description |
|---|---|---|---|
Levenshtein |
distance() |
0 to ∞ | Number of insertions, deletions, or substitutions needed. |
DamerauLevenshtein |
distance() |
0 to ∞ | Levenshtein with transpositions included. |
Hamming |
distance() |
0 to ∞ | Number of differing positions (requires equal-length strings). |
Jaro |
similarity() |
0.0 to 1.0 | Similarity based on character matches and transpositions. |
Jaro |
distance() |
0.0 to 1.0 | Distance measure (1 - similarity). |
JaroWinkler |
similarity() |
0.0 to 1.0 | Jaro with a prefix match boost for similar string starts. |
JaroWinkler |
distance() |
0.0 to 1.0 | Distance measure (1 - similarity). |
LCS |
length() |
0 to ∞ | Length of the longest common subsequence. |
SmithWaterman |
score() |
0 to ∞ | Local alignment scoring for best-matching subsequences. |
NeedlemanWunsch |
score() |
-∞ to ∞ | Global alignment scoring for entire string similarity. |
Cosine |
similarity() |
0.0 to 1.0 | Similarity via character frequency vectors. |
Cosine |
similarityFromVectors() |
-1.0 to 1.0 | Cosine similarity for numeric vector inputs. |
Jaccard |
index() |
0.0 to 1.0 | Ratio of shared to total unique characters. |
MongeElkan |
similarity() |
0.0 to 1.0 | Average best-word similarity using Jaro-Winkler internally. |
Understanding Distance vs Similarity
This library provides both distance and similarity measures for certain algorithms:
-
Distance measures: Return
0.0for identical strings and higher values for more different strings- Examples:
Levenshtein::distance(),Hamming::distance(),Jaro::distance(),JaroWinkler::distance()
- Examples:
- Similarity measures: Return
1.0for identical strings and lower values for more different strings- Examples:
Cosine::similarity(),Jaccard::index(),Jaro::similarity(),JaroWinkler::similarity()
- Examples:
For Jaro and Jaro-Winkler algorithms, both functions are available:
similarity()returns values from 0.0 (completely different) to 1.0 (identical)distance()returns values from 0.0 (identical) to 1.0 (completely different)- The relationship is:
distance = 1.0 - similarity