Download the PHP package token27/nexus-ai-tokenizer without Composer
On this page you can find all versions of the php package token27/nexus-ai-tokenizer. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download token27/nexus-ai-tokenizer
More information about token27/nexus-ai-tokenizer
Files in token27/nexus-ai-tokenizer
Package nexus-ai-tokenizer
Short Description Provider-agnostic LLM token counter for PHP 8.3+. Supports Tiktoken BPE (OpenAI, Claude approx.), HuggingFace JSON vocabularies (DeepSeek, LLaMA, Mistral), SentencePiece (Gemini), and extensible custom strategies.
License MIT
Informations about the package nexus-ai-tokenizer
nexus-ai-tokenizer
A universal, multi-provider PHP 8.3+ token counting and context estimation library. Manage and estimate context windows across fragmented AI models with an elegant and extensible API.
Why nexus-ai-tokenizer?
Different providers use completely different tokenization algorithms. OpenAI relies on Tiktoken (cl100k_base, o200k_base), DeepSeek uses HuggingFace BPE, Gemini uses SentencePiece, and Claude has a proprietary BPE.
nexus-ai-tokenizer solves this fragmentation by:
- Zero-config OpenAI counting: Built-in, fast, exact token counting for all modern OpenAI models (
gpt-4o,o1,gpt-3.5-turbo, etc.). - Consistent Interface: Handle any provider with one method (
TokenizerEngine::for('model')). - Graceful Approximations: Seamlessly approximate limits for closed tokenizers (like Claude) using
cl100k_base. - Exact Native Integration: Directly load
.jsonHuggingFace and.modelSentencePiece vocabularies when you need 100% exact math. - Multimodal Counting: Translates image resolutions and detail settings into exact token costs across providers.
- Batching & Context Windows: Easy percentage checks,
isWithinContextWindow(), and batch token processing.
Features
- Built-in Catalog: Out-of-the-box support mapping 50+ mainstream models to the correct strategy.
- Tiktoken Strategy: Rapid exact counts for
o200k_base,cl100k_base, and more. - ChatML & Conversation Overhead:
countChat()handles exact provider metadata framing. - Extensible Architecture: Define your own custom tokenizer strategies and dynamic providers.
- Type Safety: PHPStan Level 8, immutable Value Objects.
Installation
Requires: PHP 8.3+
Note: For exact SentencePiece tokenization (e.g. Gemini/Gemma), the textualization/sentencepiece extension is optionally required. For HuggingFace vocabularies, no extensions are needed.
Quick Start
1. Zero-config Token Counting
Count exact tokens for any OpenAI model right out of the box:
2. Multi-turn Chat Counting
Accurately calculate token payloads including provider-specific overheads (role markers, ChatML syntax):
3. Context Window Management
Verify if prompts will fit within a provider's window limit to handle automatic truncation or provider switching:
4. Multimodal Image Tokens
Translates image dimensions to required token spend:
Documentation
- Installation & Setup — Dependencies and optional extensions
- Text & Chat Token Counting — Calculating content and formatting overhead
- Context Windows — Handling prompt limits and budget calculations
- Exact vs Approximate Counting — How closed models are approximated
- HuggingFace BPE — Using
.jsonvocabularies for exact DeepSeek/Llama counts - SentencePiece / Gemini — Using
.modelfiles with FFI - Vision & Images — Estimating tokens for multimodal images
- Custom Strategies — Developing dynamic catalogs and implementations
- Testing — Running the test suite
- Contributing — Development guidelines
License
MIT. See LICENSE.
All versions of nexus-ai-tokenizer with dependencies
ext-json Version *
ext-mbstring Version *
danny50610/bpe-tokeniser Version ^0.3.0