Download the PHP package mauricioperera/php-vector-store without Composer
On this page you can find all versions of the php package mauricioperera/php-vector-store. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download mauricioperera/php-vector-store
More information about mauricioperera/php-vector-store
Files in mauricioperera/php-vector-store
Package php-vector-store
Short Description Zero-dependency PHP vector database with BM25, hybrid search, Matryoshka, IVF indexing, and Int8 quantization
License MIT
Homepage https://github.com/MauricioPerera/php-vector-store
Informations about the package php-vector-store
PHP Vector Store
Zero-dependency PHP vector database with BM25 full-text search, hybrid search (vector + text), Matryoshka progressive search, IVF indexing, Int8 quantization, and 1-bit binary quantization (32x compression). Pure PHP 8.1+ — no SQLite, no C extensions, no FFI.
Why
Most vector databases require C extensions (sqlite-vec), external services (Pinecone, Weaviate), or specific runtimes (Python). PHP Vector Store runs anywhere PHP runs — shared hosting, WordPress, Laravel, any framework.
New in v1.1: BinaryQuantizedStore — 1-bit sign quantization, 96 bytes/vector (768d), 27.8x faster than Int8, Hamming distance search via XOR + popcount.
v0.2: BM25 full-text search, hybrid search fusion (RRF + Weighted), multiple distance metrics, StoreInterface for polymorphism, typed models, and a PHPUnit test suite.
Scaling Guide
| Vectors | Recommended Config | Storage/vec | Total (100K) | Speed |
|---|---|---|---|---|
| <5K | Float32 768d + Matryoshka | 3,072 B | 300 MB | ~3ms |
| 5K-20K | Float32 384d + Matryoshka | 1,536 B | 150 MB | ~1.4ms |
| 20K-100K | Int8 384d + IVF + Matryoshka | 392 B | 38 MB | ~5ms |
| 100K-500K | Binary 768d + IVF + Matryoshka | 96 B | 9.4 MB | ~6ms |
| >500K | Use sqlite-vec or external service | — | — | — |
Quick Start
Features
Vector Storage (Float32, Int8 & Binary)
All three implement StoreInterface — use them interchangeably.
BM25 Full-Text Search
Okapi BM25 inverted index, collection-aware, with persistence.
The SimpleTokenizer handles Unicode text with configurable stop words:
Hybrid Search
Combines vector similarity with BM25 text relevance using fusion strategies.
RRF (Reciprocal Rank Fusion): score(d) = Σ 1/(k + rank(d)) — combines ranks from both legs without needing score normalization. Best default choice.
Weighted: Min-max normalizes both score sets to [0,1], then combined = w_vec * vecNorm + w_text * textNorm. Use when you want explicit control over the balance.
Distance Metrics
Works with search(), matryoshkaSearch(), and searchAcross() on all three stores.
IVF Clustering
K-means partitions vectors into clusters for sub-linear search.
Works with VectorStore, QuantizedStore, and BinaryQuantizedStore (via StoreInterface).
Matryoshka Multi-Stage Search
Progressive refinement — each stage narrows candidates before the next.
Speedup: 3-5x over brute-force (Int8), 13.7x (Binary). Combined with IVF: 10-15x.
StoreInterface
VectorStore, QuantizedStore, and BinaryQuantizedStore all implement StoreInterface:
Typed Models
Typed Exceptions
Concurrency & Scaling Notes
File Locking
All flush() operations use flock(LOCK_EX) to prevent race conditions when multiple PHP processes write to the same collection simultaneously. This ensures atomic writes even under concurrent web requests.
Dimension Validation
set() throws DimensionMismatchException if the vector has fewer dimensions than the store was configured with. This catches mismatches early (e.g., passing a 384d vector to a 768d store).
JSON Manifest Scaling
Each collection stores its ID list and metadata in a .json sidecar file. For collections approaching 100K vectors, this manifest can grow large (~10-20 MB). Considerations:
- Memory: The entire manifest is loaded into memory on first access to a collection. For 100K vectors with metadata, budget ~50-100 MB of PHP memory.
- Latency: JSON decode of a large manifest adds ~50-200ms on first load (cached for subsequent operations within the same request).
- Mitigation: Use multiple collections (per entity type) to keep individual manifests small. A collection of 10K vectors has a ~1-2 MB manifest.
For datasets beyond 100K vectors, consider sqlite-vec or an external vector database.
API Reference
StoreInterface (VectorStore, QuantizedStore & BinaryQuantizedStore)
BM25\Index
HybridSearch
Options: fetchK, vectorWeight, textWeight, rrfK, dimSlice.
IVFIndex
Math (static)
Storage Format
Testing
57 tests across 6 suites: VectorStore, QuantizedStore, BinaryQuantizedStore, IVFIndex, BM25, HybridSearch.
Performance
Speed (1,000 vectors, bge-base 768d, PHP 8.2)
| Method | Int8 | Binary | Speedup |
|---|---|---|---|
| Brute-force 768d | 556ms | 20ms | 27.8x |
| Matryoshka 128→384→768 | 86ms | 6.3ms | 13.7x |
Storage
| Format | Per vector | 10K | 100K | 500K |
|---|---|---|---|---|
| Float32 768d | 3,072 B | 30 MB | 300 MB | 1.5 GB |
| Float32 384d | 1,536 B | 15 MB | 150 MB | 750 MB |
| Int8 768d | 776 B | 7.6 MB | 76 MB | 380 MB |
| Int8 384d | 392 B | 3.8 MB | 38 MB | 192 MB |
| Binary 768d | 96 B | 0.9 MB | 9.4 MB | 47 MB |
| Binary 384d | 48 B | 0.47 MB | 4.7 MB | 23 MB |
Integration Patterns
WordPress
Laravel
Neuron AI (RAG)
Architecture
License
MIT