Download the PHP package dachcom-digital/dynamic-search-data-provider-crawler without Composer
On this page you can find all versions of the php package dachcom-digital/dynamic-search-data-provider-crawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package dynamic-search-data-provider-crawler
Dynamic Search | Data Provider: Web Crawler
A spider crawler extension for Pimcore Dynamic Search.
Release Plan
Release | Supported Pimcore Versions | Supported Symfony Versions | Release Date | Maintained | Branch |
---|---|---|---|---|---|
3.x | 11.0 |
^6.2 |
28.09.2023 | Feature Branch | master |
2.x | 10.0 - 10.6 |
^5.4 |
19.12.2021 | No | 2.x |
1.x | 6.6 - 6.9 |
^4.4 |
18.04.2021 | No | 1.x |
Installation
Dynamic Search Bundle
You need to install / enable the Dynamic Search Bundle first. Read more about it here. After that, proceed as followed:
Add Bundle to bundles.php
:
Basic Setup
Provider Options
always
Name | Default Value | Description |
---|---|---|
own_host_only |
false | |
allow_subdomains |
false | |
allow_query_in_url |
false | |
allow_hash_in_url |
false | |
allowed_mime_types |
['text/html', 'application/pdf'] | |
allowed_schemes |
['http'] | |
content_max_size |
0 |
full_dispatch
Name | Default Value | Description |
---|---|---|
seed |
null | |
valid_links |
[] | |
user_invalid_links |
[] | |
max_link_depth |
15 | |
max_crawl_limit |
0 |
single_dispatch
Name | Default Value | Description |
---|---|---|
host |
null |
Resource Normalizer
DefaultResourceNormalizer
Identifier: web_crawler_default_resource_normalizer
Normalize simple documents
Options: none
LocalizedResourceNormalizer
Identifier: web_crawler_localized_resource_normalizer
Scaffold localized documents
Options:
Name | Default Value | Allowed Type | Description |
---|---|---|---|
locales |
all pimcore enabled languages | array | |
skip_not_localized_documents |
true | bool | if false, an exception rises if a document/object has no valid locale |
Transformer
Scaffolder
HttpResponseHtmlDataScaffolder
Identifier: http_response_html_scaffolder
Simple object scaffolder.
Supported types: VDB\Spider\Resource
with content-type text/html
.
HttpResponsePdfDataScaffolder
Identifier: http_response_pdf_scaffolder
Simple object scaffolder.
Supported types: VDB\Spider\Resource
with content-type application/pdf
.
PimcoreElementScaffolder
Identifier: pimcore_element_scaffolder
Simple object scaffolder.
Supported types: Asset
, Document
, DataObject\Concrete
.
Field Transformer
UriExtractor
Identifier: resource_uri_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
Options: none
LanguageExtractor
Identifier: resource_language_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
Options: none
MetaExtractor
Identifier: resource_meta_extractor
Supported Scaffolder: http_response_html_scaffolder
Return Type: string|null
Options:
Name | Default Value | Allowed Type | Description |
---|---|---|---|
name |
null | string | The name of the meta tag to fetch the value from |
HtmlTagExtractor
Identifier: resource_html_tag_content_extractor
Supported Scaffolder: http_response_html_scaffolder
Return Type: string|null
Options: none
TextExtractor
Identifier: resource_text_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
Name | Default Value | Allowed Type | Description |
---|---|---|---|
content_start_indicator |
<!-- main-content --> |
string | Marks the begin of the indexable page content |
content_end_indicator |
<!-- /main-content --> |
string | Marks the end of the indexable page conten |
content_exclude_start_indicator |
null | null|string | Marks the begin of the text to be excluded from indexing |
content_exclude_end_indicator |
null | null|string | Marks the end of the text to be excluded from indexing |
TitleExtractor
Identifier: resource_title_extractor
Supported Scaffolder: http_response_html_scaffolder
, http_response_pdf_scaffolder
Return Type: string|null
Options: none
Copyright and License
Copyright: DACHCOM.DIGITAL
For licensing details please visit LICENSE.md
Upgrade Info
Before updating, please check our upgrade notes!
All versions of dynamic-search-data-provider-crawler with dependencies
vdb/php-spider Version ^0.7
dachcom-digital/dynamic-search Version ^3.0