Download the PHP package benbjurstrom/markdown-object without Composer
On this page you can find all versions of the php package benbjurstrom/markdown-object. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download benbjurstrom/markdown-object
More information about benbjurstrom/markdown-object
Files in benbjurstrom/markdown-object
Package markdown-object
Short Description Structure-aware, token-smart chunking for Markdown documents
License MIT
Homepage https://github.com/benbjurstrom/markdown-object
Informations about the package markdown-object
Markdown Object
Intelligent Markdown chunking that preserves document structure and semantic relationships. Creates token-aware chunks optimized for embedding model context windows. Built on League CommonMark and Yethee\Tiktoken.
Try It Out
Clone the Interactive Demo to experiment with chunking in real-time. Paste your Markdown, adjust parameters, and see how content gets split into semantic chunks.
Basic Usage
Installation
You can install the package via composer:
Advanced Usage
JSON Serialization
Custom Tokenizer
Custom Chunking Parameters
A note on Token Counts
Chunk token counts include separator tokens (\n\n) added when joining content pieces, so they may be slightly higher than the sum of individual node tokens. This is expected and ensures the count accurately reflects what will be embedded.
Chunking Strategy
The package uses hierarchical greedy packing to create semantically coherent chunks that respect your document's natural structure.
Algorithm Overview
The chunker intelligently splits content using a two-threshold system:
target- Soft limit for splitting large content blocks (paragraphs, code, tables)hardCap- Hard limit for hierarchical decisions (when to split vs. keep sections together)
How It Works
- Start whole – If the entire document fits within
hardCap, return as a single chunk - Split hierarchically – When too large, split at the highest heading level (H1, then H2, etc.)
- Pack greedily – Combine sibling sections that fit together within
hardCap - Recurse deeply – Sections that don't fit are processed recursively with updated breadcrumbs
- Minimize fragments – After recursion, continue packing remaining siblings to avoid orphaned content
- Split smartly – Long paragraphs, code blocks, and tables break at
targetboundaries while preserving readability
Testing
Run the tests with:
Documentation
For detailed architecture documentation, see ARCHITECTURE.md.
For examples of hierarchical packing behavior, see EXAMPLES.md.
Changelog
Please see CHANGELOG for more information on what has changed recently.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Security Vulnerabilities
Please review our security policy on how to report security vulnerabilities.
Credits
- Ben Bjurstrom
- All Contributors
License
The MIT License (MIT). Please see License File for more information.