Download the PHP package vipnytt/sitemapparser without Composer
On this page you can find all versions of the php package vipnytt/sitemapparser. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download vipnytt/sitemapparser
More information about vipnytt/sitemapparser
Files in vipnytt/sitemapparser
Package sitemapparser
Short Description XML Sitemap parser class compliant with the Sitemaps.org protocol.
License MIT
Homepage https://github.com/VIPnytt/SitemapParser
Informations about the package sitemapparser
XML Sitemap parser
An easy-to-use PHP library to parse XML Sitemaps compliant with the Sitemaps.org protocol.
The Sitemaps.org protocol is the leading standard and is supported by Google, Bing, Yahoo, Ask and many others.
Features
- Basic parsing
- Recursive parsing
- String parsing
- Custom User-Agent string
- Proxy support
- URL blacklist
- request throttling (using https://github.com/hamburgscleanest/guzzle-advanced-throttle)
- retry (using https://github.com/caseyamcl/guzzle_retry_middleware)
- advanced logging (using https://github.com/gmponos/guzzle_logger)
Formats supported
- XML
.xml
- Compressed XML
.xml.gz
- Robots.txt rule sheet
robots.txt
- Line separated text (disabled by default)
Requirements:
- PHP 5.6 or 7.0+, alternatively HHVM
- PHP extensions:
- Optional:
- https://github.com/caseyamcl/guzzle_retry_middleware
- https://github.com/hamburgscleanest/guzzle-advanced-throttle
Installation
The library is available for install via Composer. Just add this to your
composer.json
file:
Then run composer update
.
Getting Started
Basic example
Returns an list of URLs only.
Advanced
Returns all available tags, for both Sitemaps and URLs.
Recursive
Parses any sitemap detected while parsing, to get an complete list of URLs.
Use url_black_list
to skip sitemaps that are part of parent sitemap. Exact match only.
Parsing of line separated text strings
Note: This is disabled by default to avoid false positives when expecting XML, but fetches plain text instead.
To disable strict
standards, simply pass this configuration to constructor parameter #2: `.
Throttling
-
Install middleware:
-
Define host rules:
-
Create handler stack:
-
Create middleware:
-
Create client manually:
- Pass client as an argument or use
setClient
method:
More details about this middle ware is available here
Automatic retry
-
Install middleware:
-
Create stack:
-
Add middleware to the stack:
-
Create client manually:
- Pass client as an argument or use setClient method:
More details about this middle ware is available here
Advanced logging
-
Install middleware:
-
Create PSR-3 style logger
-
Create handler stack:
-
Push logger middleware to stack
-
Create client manually:
- Pass client as an argument or use
setClient
method:
More details about this middleware config (like log levels, when to log and what to log) is available here
Additional examples
Even more examples available in the examples directory.
Configuration
Available configuration options, with their default values:
If an User-agent also is set using the GuzzleHttp request options, it receives the highest priority and replaces the other User-agent.
All versions of sitemapparser with dependencies
guzzlehttp/guzzle Version ^6.0 || ^7.0
ext-mbstring Version *
ext-simplexml Version *
lib-libxml Version *