PHP download

Download the PHP package ezimuel/phpvector without Composer

On this page you can find all versions of the php package ezimuel/phpvector. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

Table of contents
Download ezimuel/phpvector
More information about ezimuel/phpvector
Files in ezimuel/phpvector

Vendor ezimuel
Package phpvector
Short Description A vector database in PHP implementing HNSW for approximate nearest-neighbor search and BM25 for hybrid full-text + vector retrieval.
License MIT

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:

If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.

Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
To use Composer is sometimes complicated. Especially for beginners.
Composer needs much resources. Sometimes they are not available on a simple webspace.
If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.

Please rate this library. Is it a good library?

Example code of ezimuel/phpvector

Informations about the package phpvector

PHPVector

A pure-PHP vector database implementing HNSW (Hierarchical Navigable Small World) for approximate nearest-neighbour search and BM25 for full-text retrieval. Both engines can be combined into a single hybrid search pipeline.

Requirements

PHP 8.2+
No external PHP extensions required for core functionality
ext-pcntl (optional) — enables asynchronous document writes for lower insert latency

Installation

Quick start

1. Insert documents

A Document holds a dense embedding vector, optional raw text for BM25, and any metadata you want returned with results. The id field is optional — if omitted, a random UUID v4 is assigned automatically.

2. Vector search

Find the k most similar documents to a query vector using HNSW.

3. Full-text search

Rank documents by BM25 relevance against a text query.

4. Hybrid search

Fuse vector similarity and BM25 scores into a single ranked list.

Reciprocal Rank Fusion (recommended)

RRF is rank-based and scale-invariant — no tuning required.

Weighted combination

Normalises both score ranges to [0, 1] then applies explicit weights.

Configuration

Both the HNSW and BM25 engines are fully configurable. Pass config objects to the VectorDatabase constructor.

Distance metrics

Metric	Best for
`Distance::Cosine`	Text embeddings, normalised vectors
`Distance::Euclidean`	Raw, unnormalized vectors
`Distance::DotProduct`	Unit-normalized vectors (faster than Cosine)
`Distance::Manhattan`	Sparse vectors, robustness to outliers

HNSW tuning cheat-sheet

Goal	Knob
Better recall	Increase `efSearch` or `efConstruction`
Faster queries	Decrease `efSearch`
Less memory	Decrease `M`
Better graph on clustered data	Keep `useHeuristic: true`

Persistence

PHPVector uses a folder-based persistence model. Each database lives in its own directory containing separate files for the HNSW graph, the BM25 index, and one file per document. This design has two key advantages:

Low memory footprint on load — only the HNSW graph and BM25 index are loaded into memory. Individual document files (docs/{n}.bin) are read lazily, only for the documents that appear in search results.
Low insert latency — document files are written to disk asynchronously in a forked child process (requires ext-pcntl), so addDocument() returns immediately.

Folder layout

Saving

Pass a path to the constructor to enable persistence. Each addDocument() call writes the document file to docs/ (asynchronously when ext-pcntl is available). Call save() once to flush the HNSW graph and BM25 index — it waits for any outstanding async writes before proceeding.

Loading

Use VectorDatabase::open() to load a previously saved folder. Only hnsw.bin and bm25.bin are read into memory; document files are loaded on demand after search.

Pass the same HNSWConfig (including the same distance metric) that was used when building the index — a RuntimeException is thrown on mismatch.

Custom configuration on open

Note: Only efSearch and bm25Config/tokenizer affect query-time behaviour and can differ from build time. distance and the graph parameters (M, efConstruction) are fixed at build time — distance is validated on open() and must match.

Incremental updates

You can add new documents to a database that was loaded from disk, then call save() again. The existing document files are left in place; only the new ones are written along with updated index files.

Typical workflow: build once, serve many

Multi-language stop words

Stop words are provided via StopWordsProviderInterface. Built-in providers:

txt

Italian stop words

e di a che il la

Available providers:

EnglishStopWords - English stop words (default)
ItalianStopWords - Italian stop words
FileStopWords - Load from file

Deleting and updating documents

Deleted documents are soft-deleted from the HNSW graph (kept for connectivity but excluded from results) and fully removed from the BM25 index. Document files are deleted from disk immediately.

Metadata filtering

Filter search results by document metadata. Filters can be combined with any search method — vector, text, or hybrid.

Creating filters

Use the MetadataFilter value object. All eleven operators are supported:

Filtering search results

Pass filters to any search method. Multiple filters are ANDed together by default.

OR groups (nested arrays)

Wrap filters in a nested array to create OR groups. Filters at the top level are ANDed; filters inside a nested array are ORed.

Over-fetching for filtered queries

When filters are applied, the search may need to examine more candidates than k to find enough matching documents. By default, the search fetches k * 5 candidates, then filters. You can tune this:

Note: Filtered queries may return fewer than k results if not enough documents match.

Updating metadata

Update metadata on existing documents without re-indexing vectors or text:

The patchMetadata() method:

Merges patch into existing metadata (existing keys preserved unless overwritten)
Does NOT touch HNSW or BM25 indexes (fast, metadata-only operation)
Persists immediately when database has a path configured

Metadata-only search

Query documents by metadata alone, without a vector or text query:

Note: Documents missing the sortBy key are placed at the end of results. All results have score = 1.0 (no ranking).

Strict type comparison

Metadata filtering uses strict type comparison (PHP ===). This means:

String '5' does NOT match integer 5
Float 1.0 does NOT match integer 1

Custom tokenizer

Implement TokenizerInterface to plug in stemming, lemmatization, or any language-specific logic.

Benchmark

A VectorDBBench-style CLI benchmark lives in benchmark/. It measures index build throughput, serial QPS, P99 tail latency, Recall@k against brute-force ground truth, and persistence speed.

Available scenarios

Key	Vectors	Dims	Notes
`xs`	1,000	128	Quick smoke test
`small`	10,000	128	SIFT-small scale
`medium`	50,000	128	SIFT-medium scale
`large`	100,000	128	Requires ~512 MB RAM
`highdim`	10,000	768	Text-embedding scale (Cohere-style)

The report is printed as Markdown to stdout (or a file via --output). Progress messages go to stderr so piping works cleanly: php benchmark/benchmark.php > report.md.

Running the tests

Copyright

All versions of phpvector with dependencies

PHP Build Version

Package Version

Version 0.3.0 Release 28. May 2026
create-project require 0 people chose require and
0 people chose create-project.

Download

Download latest version of phpvector from vendor ezimuel

Requires php Version ^8.2

Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package ezimuel/phpvector contains the following files

Loading the files please wait ...