Download the PHP package yetidevworks/yetisearch without Composer
On this page you can find all versions of the php package yetidevworks/yetisearch. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download yetidevworks/yetisearch
More information about yetidevworks/yetisearch
Files in yetidevworks/yetisearch
Package yetisearch
Short Description A powerful, pure-PHP search engine library with advanced features
License MIT
Informations about the package yetisearch
YetiSearch
A powerful, pure-PHP search engine library with advanced full-text search capabilities, designed for modern PHP applications.
Table of Contents
- Features
- Requirements
- Installation
- Quick Start
- Usage Examples
- Basic Indexing
- Advanced Indexing
- Search Examples
- Document Management
- Configuration
- Advanced Features
- Document Chunking
- Field Boosting and Exact Match Scoring
- Multi-language Support
- Custom Stop Words
- Geo-Spatial Search
- Search Result Deduplication
- Highlighting
- Fuzzy Search
- Faceted Search
- Architecture
- Testing
- API Reference
- Performance
- Benchmark Results
- Performance Characteristics
- Performance Tuning
- Bottlenecks and Solutions
- Comparison with Other Solutions
- Best Practices for Performance
- Future Features
- Contributing
- License
Features
- 🔍 Full-text search powered by SQLite FTS5 with BM25 relevance scoring
- 📄 Automatic document chunking for indexing large documents
- 🎯 Smart result deduplication - shows best match per document by default
- 🌍 Multi-language support with built-in stemming for multiple languages
- ⚡ Lightning-fast indexing and searching with SQLite backend
- 🔧 Flexible architecture with interfaces for easy extension
- 📊 Advanced scoring with intelligent field boosting and exact match prioritization
- 🎨 Search highlighting with customizable tags
- 🔤 Advanced fuzzy matching with multiple algorithms (Trigram, Jaro-Winkler, Levenshtein, Basic)
- 🎯 Enhanced multi-word matching for more accurate search results
- 🏆 Smart result ranking prioritizing exact matches over fuzzy matches
- 📈 Faceted search and aggregations support
- 📍 Geo-spatial search with R-tree indexing for location-based queries
- 🚀 Zero dependencies except PHP extensions and small utility packages
- 💾 Persistent storage with automatic database management
- 🔐 Production-ready with comprehensive test coverage
Requirements
- PHP 7.4 or higher
- SQLite3 PHP extension
- PDO PHP extension with SQLite driver
- Mbstring PHP extension
- JSON PHP extension
Installation
Install YetiSearch via Composer:
Quick Start
Usage Examples
Basic Indexing
Advanced Indexing
Search Examples
Multi-Index Search
Search across multiple indexes simultaneously:
Document Management
Configuration
Full Configuration Example
Advanced Features
Document Chunking
YetiSearch automatically splits large documents into smaller chunks for better search performance and relevance:
Field Boosting and Exact Match Scoring
YetiSearch provides intelligent field-weighted scoring with special handling for exact matches in high-priority fields:
How Field Boosting Works:
- Basic Boost Values: Each field's boost value multiplies its relevance score
-
High-Priority Fields (boost ≥ 2.5): Get special exact match handling:
- Exact field match: +50 point bonus (e.g., searching "Star Wars" finds a movie titled exactly "Star Wars")
- Near-exact match: +30 point bonus (ignoring punctuation)
- Length penalty: Shorter exact matches score higher than longer titles containing the phrase
- Phrase Matching: Exact phrases get 15x boost over individual word matches
Example:
This intelligent scoring ensures the most relevant results appear first, with exact matches in important fields (like titles or names) getting priority over partial matches in longer text.
Enhanced Result Ranking (v1.0.3):
- Exact vs Fuzzy Priority: Regular matches always rank higher than fuzzy matches
- Shorter Match Preference: Among similar matches, shorter documents score higher
- Multi-word Query Handling: Improved matching for queries with multiple words
- Short Text Flexibility: Better handling of short text queries and matches
For more detailed information about scoring and configuration options, see the Field Boosting and Scoring Guide.
For comprehensive fuzzy search documentation, see the Fuzzy Search Guide.
Multi-language Support
Supported languages:
- English (default)
- French
- German
- Spanish
- Italian
- Portuguese
- Dutch
- Swedish
- Norwegian
- Danish
Custom Stop Words
You can add custom stop words to exclude specific terms from being indexed:
Custom stop words are applied in addition to the default language-specific stop words. They are case-insensitive and apply across all languages.
Geo-Spatial Search
YetiSearch supports location-based searching using SQLite's R-tree spatial indexing:
Geo Utilities:
Indexing with Bounds:
Search Result Deduplication
By default, YetiSearch deduplicates results to show only the best matching chunk per document:
Highlighting
Search results can include highlighted matches:
Fuzzy Search
Enable fuzzy matching for typo tolerance:
Advanced Fuzzy Search Algorithms
YetiSearch supports multiple fuzzy matching algorithms for different use cases:
Available Fuzzy Algorithms:
-
Trigram (Default) - Best overall accuracy and performance
- Breaks words into 3-character sequences for matching
- Excellent for most use cases
- Good balance of speed and accuracy
-
Jaro-Winkler - Optimized for short strings
- Great for names, titles, and short text
- Favors matches with common prefixes
- Very fast performance
- Levenshtein - Edit distance algorithm
- Counts insertions, deletions, and substitutions
- Most flexible but requires term indexing
- Best for handling complex typos
Configuration Options:
fuzzy_algorithm
: Choose between 'trigram' (default), 'jaro_winkler', or 'levenshtein'levenshtein_threshold
: Maximum edit distance allowed for Levenshtein (1-3 recommended)- 1 = Single character changes only (fastest)
- 2 = Up to 2 character edits (balanced)
- 3 = Up to 3 character edits (most flexible but slower)
min_term_frequency
: Minimum occurrences for a term to be considered for fuzzy matchingmax_indexed_terms
: Maximum number of indexed terms to check (affects performance)max_fuzzy_variations
: Maximum fuzzy variations generated per search termfuzzy_score_penalty
: Score reduction factor for fuzzy matches (0.0 = no penalty, 1.0 = zero score)indexed_terms_cache_ttl
: How long to cache the indexed terms list (seconds)
Performance Considerations:
Different algorithms have different performance characteristics:
- Trigram: Fast indexing and searching, no additional term indexing required
- Jaro-Winkler: Very fast, ideal for short text matching
- Levenshtein: Requires term indexing, impacting indexing performance (~295 docs/sec vs ~670 docs/sec)
Term indexing is only performed when fuzzy_algorithm
is set to 'levenshtein'
. For most use cases, 'trigram'
provides the best balance of accuracy and performance.
Performance Optimization Tips:
Algorithm Benchmarking:
YetiSearch includes built-in benchmarking tools to help you choose the best fuzzy algorithm for your use case:
Faceted Search
Get aggregated counts for categories, tags, etc:
Architecture
YetiSearch follows a modular architecture with clear separation of concerns:
Key Components
- Analyzer: Tokenizes and processes text (stemming, stop words, etc.)
- Indexer: Manages document indexing and updates
- SearchEngine: Handles search queries and result processing
- Storage: Abstracts the storage backend (currently SQLite)
Testing
YetiSearch includes comprehensive test coverage. Run tests using various commands:
Basic Testing
Coverage Reports
Filtered Testing
Advanced Testing
Static Analysis
API Reference
YetiSearch Class
Document Structure
Documents are represented as associative arrays with the following structure:
Content vs Metadata
Understanding the distinction between content
and metadata
fields:
Content Fields:
- Are indexed and searchable - these fields are analyzed, tokenized, and can be found via search queries
- Affect relevance scoring - matches in content fields contribute to the document's search score
- Support field boosting - you can make certain fields more important for ranking
- Are returned in search results by default
- Examples: title, body, description, tags, author, category
Metadata Fields:
- Are NOT indexed - stored in the database but not searchable
- Don't affect search scoring - won't influence relevance ranking
- Are returned in results - currently included but could be made optional
- Useful for filtering - can still filter results by metadata values using filters
- Examples: prices, stock counts, internal IDs, timestamps, flags, view counts
When to use metadata:
This separation improves performance (less data to index), prevents false matches (searching "42" won't find products with 42 in stock), and keeps your search index focused on actual searchable content.
SearchQuery Model
Result Structure
Search results are returned as an associative array:
Performance Tips
-
Index Configuration
- Use appropriate field boosts - don't over-boost
- Only index fields you need to search
- Use metadata for non-searchable data
- Configure reasonable chunk sizes (default 1000 chars works well)
-
Search Optimization
- Use field-specific searches when possible:
inFields(['title'])
- Enable
unique_by_route
(default) to avoid duplicate documents - Use filters instead of text queries for exact matches
- Limit results with reasonable page sizes
- Use field-specific searches when possible:
- Storage Optimization
- Run
optimize()
periodically on large indexes - Use WAL mode for better concurrency (default)
- Consider separate indexes for different content types
- Run
Error Handling
Performance
YetiSearch is designed for high performance with minimal resource usage. Here are real-world benchmarks and performance characteristics.
Benchmark Results
Tested on M4 MacBook Pro with PHP 8.3, using a dataset of 32,000 movies:
Indexing Performance
Operation | Performance | Details |
---|---|---|
Document Indexing | ~4,360 docs/sec | Without fuzzy term indexing |
With Levenshtein | ~1,770 docs/sec | With term indexing for fuzzy search |
Batch Processing | 250 docs/batch | Optimal batch size |
Memory Usage | ~60MB | For 32k documents |
Search Performance
Query Type | Response Time | Details |
---|---|---|
Simple Search | 2-5ms | Single term, no fuzzy |
Phrase Search | 3-8ms | Multi-word queries |
Fuzzy Search (Trigram) | 5-15ms | Default algorithm |
Fuzzy Search (Levenshtein) | 10-30ms | Most accurate |
Complex Queries | 15-50ms | With filters, facets, geo |
Real-World Example
From the movie database benchmark:
- Dataset: 32k movies with title, overview, genres
- Index Size: ~200MB on disk
- Indexing Time: 7.27 seconds (~4,420 movies/sec)
- Search Examples:
- "Harry Potter" (exact) → results in 4.7ms
- "Matrix" (exact) -> results in 0.47ms
- "Lilo and Stich" (fuzzy) → "Lilo & Stitch" in 26ms
- "Cristopher Nolan" (fuzzy) → "Christopher Nolan" films in 32ms
Performance Characteristics
1. Linear Scalability
- Performance scales linearly with document count
- 100k documents ≈ 10x the time of 10k documents
- No exponential performance degradation
2. Memory Efficiency
- SQLite backend provides excellent memory management
- Only active data kept in memory
- Configurable cache sizes for different workloads
3. Disk I/O Optimization
- Write-Ahead Logging (WAL) for concurrent access
- Batch operations reduce disk writes
- Automatic index optimization
Performance Tuning
For Maximum Indexing Speed
For Fastest Searches
For Best Accuracy
Bottlenecks and Solutions
Bottleneck | Impact | Solution |
---|---|---|
Large documents | Slow indexing | Increase chunk_size |
Many small documents | I/O overhead | Increase batch_size |
Complex queries | Slow searches | Add specific indexes |
Fuzzy search | CPU intensive | Use trigram or basic algorithm |
High concurrency | Lock contention | Enable WAL mode |
Comparison with Other Solutions
Feature | YetiSearch | Elasticsearch | MeiliSearch | TNTSearch |
---|---|---|---|---|
Setup Time | < 1 min | 10-30 min | 5-10 min | < 1 min |
Memory Usage | 50-200MB | 1-4GB | 200MB-1GB | 100-500MB |
Dependencies | PHP only | Java + Service | Binary/Docker | PHP only |
Index Speed | 4,500/sec | 10,000/sec | 5,000/sec | 2,000/sec |
Search Speed | 1-30ms | 5-50ms | 10-100ms | 5-40ms |
Best Practices for Performance
-
Index Design
- Create separate indexes for different content types
- Use appropriate field boosts
- Only index searchable content
-
Query Optimization
- Use field-specific searches when possible
- Limit results appropriately
- Enable result caching for repeated queries
-
Maintenance
- Run
optimize()
during low-traffic periods - Monitor index size and split if needed
- Clear old cache entries periodically
- Run
- Hardware Considerations
- SSD storage recommended for large indexes
- More RAM allows larger caches
- Multi-core CPUs benefit batch operations
Future Feature Ideas
The following features are ideas for future releases:
Index Management Enhancements
- Index Aliases - Create aliases for indexes to simplify management and allow seamless index switching
- Index Templates - Define templates for consistent index configuration across similar content types
- Automatic Index Routing - Route documents to appropriate indexes based on document properties
- Real-time Index Synchronization - Synchronize data between multiple indexes in real-time
- Index Versioning and Migrations - Support for index schema evolution with migration tools
Language and Analysis
- Automatic Language Detection - Detect document language automatically instead of defaulting to English
- Custom Analyzer Plugins - Allow custom text analysis plugins for specialized content
- Phonetic Matching - Support for soundex/metaphone matching for name searches
- Synonym Support - Configure synonyms for enhanced search matching
Search Enhancements
- Query DSL - Advanced query language for complex search expressions
- Search Templates - Save and reuse common search patterns
- More Like This - Find similar documents based on content similarity
- Search Analytics - Built-in analytics for search queries and results
- Full Content Result - Option to return full document content in search results
Performance and Scalability
- Distributed Search - Support for searching across multiple YetiSearch instances
- Index Sharding - Split large indexes across multiple shards
- Query Caching Improvements - More sophisticated caching strategies
- Bulk Operations API - Optimized bulk indexing and updates
Integration Features
- Webhook Support - Notify external systems of index changes
- Import/Export Tools - Tools for data migration between different search systems
- REST API - HTTP API for remote access to YetiSearch functionality
- GraphQL Support - GraphQL endpoint for flexible data querying
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Run tests (
composer test:verbose
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Credits
YetiSearch is maintained by the YetiSearch Team and contributors.
Special thanks to:
- The SQLite team for the excellent FTS5 extension
- The PHP community for continuous inspiration
- All contributors who help make YetiSearch better
All versions of yetisearch with dependencies
ext-json Version *
ext-mbstring Version *
ext-pdo Version *
ext-sqlite3 Version *
psr/log Version ^1.1