Download the PHP package llm-html-extractor/symfony-bundle without Composer
On this page you can find all versions of the php package llm-html-extractor/symfony-bundle. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download llm-html-extractor/symfony-bundle
More information about llm-html-extractor/symfony-bundle
Files in llm-html-extractor/symfony-bundle
Package symfony-bundle
Short Description Symfony bundle for extracting structured data from HTML using LLM providers
License MIT
Informations about the package symfony-bundle
LLM HTML Extractor Symfony Bundle
A powerful Symfony bundle for extracting structured data from HTML using LLM (Large Language Model) providers with a plugin architecture.
Features
- LLM-Based Extraction: Uses LLM providers (starting with Jina Reader) to extract structured data from HTML
- Type-Safe DTOs: Define extraction schemas using PHP attributes on your DTOs
- Hybrid Extraction: Easily combine LLM extraction with code-based extraction - use AI for complex fields and DomCrawler/XPath for simple structured data
- Extensible: Plugin architecture allows custom extractors for specific use cases
- Cacheable: Built-in caching support for LLM responses
- Logging: Optional logging for LLM requests/responses and cache operations
- Configurable: Flexible configuration for different LLM providers and caching strategies
Installation
Configuration
Create or update config/packages/llm_html_extractor.yaml:
Alternatively, you can use an existing HTTP client service:
Using a Custom LLM Client
To use your own LLM client implementation, just set the client parameter to your service ID:
Your custom client must implement LlmHtmlExtractor\SymfonyBundle\Client\LlmClientInterface. The bundle will validate this during container compilation and throw a clear error if the interface is not implemented.
Logging
The bundle provides comprehensive logging for debugging and monitoring:
- Request/Response Logging: When
logs.enabled: true, all LLM requests and responses are logged at info level - Cache Operations: Cache hits and misses are logged when both caching and logging are enabled
- Error Logging: Failed LLM requests are logged at error level with exception details
The decorators are applied in this order:
- Base LLM Client (e.g., JinaReaderLlmClient)
- LoggingLlmClient (if logs enabled) - logs requests/responses
- CacheableLlmClient (if cache enabled) - logs cache hits/misses
This means logged requests show the actual LLM calls (cache misses), not cached responses.
Usage
1. Define Your Extraction DTO
2. Use the Extraction Handler
3. Create Custom Extractors (Optional)
For specific extraction needs, implement the FromHtmlExtractorInterface:
Supported LLM Providers
Currently supported:
- Jina Reader (jinaai/readerlm-v2, jinaai/readerlm-v1.5)
- Uses vLLM OpenAI API standard endpoint (
/openai/v1/chat/completions) - Tested with Runpod serverless deployments
- Compatible with any vLLM deployment following the OpenAI API standard
- Uses vLLM OpenAI API standard endpoint (
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
All versions of symfony-bundle with dependencies
symfony/dependency-injection Version ^6.4|^7.0
symfony/config Version ^6.4|^7.0
symfony/http-kernel Version ^6.4|^7.0
symfony/http-client Version ^6.4|^7.0
symfony/property-access Version ^6.4|^7.0
symfony/property-info Version ^6.4|^7.0
symfony/serializer Version ^6.4|^7.0
symfony/cache Version ^6.4|^7.0
symfony/dom-crawler Version ^6.4|^7.0
symfony/yaml Version ^6.4|^7.0