Download the PHP package crwlr/crawler without Composer
On this page you can find all versions of the php package crwlr/crawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Table of contents
Download crwlr/crawler
More information about crwlr/crawler
Files in crwlr/crawler
Download crwlr/crawler
More information about crwlr/crawler
Files in crwlr/crawler
Please rate this library. Is it a good library?
Informations about the package crawler
Library for Rapid (Web) Crawler and Scraper Development
This library provides kind of a framework and a lot of ready to use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with.
To give you an overview, here's a list of things that it helps you with:
- Crawler Politeness 😇 (respecting robots.txt, throttling,...)
- Load URLs using
- a (PSR-18) HTTP client (default is of course Guzzle)
- or a headless browser (chrome) to get source after Javascript execution
- Get absolute links from HTML documents 🔗
- Get sitemaps from robots.txt and get all URLs from those sitemaps
- Crawl (load) all pages of a website 🕷
- Use cookies (or don't) 🍪
- Use any HTTP methods (GET, POST,...) and send any headers or body
- Easily iterate over paginated list pages 🔁
- Extract data from:
- Extract schema.org structured data in JSON-LD format from HTML documents
- Keep memory usage low by using PHP Generators 💪
- Cache HTTP responses during development, so you don't have to load pages again and again after every code change
- Get logs about what your crawler is doing (accepts any PSR-3 LoggerInterface)
- And a lot more...
Documentation
You can find the documentation at crwlr.software.
Contributing
If you consider contributing something to this package, read the contribution guide (CONTRIBUTING.md).
All versions of crawler with dependencies
PHP Build Version
Package Version
Requires
ext-dom Version
*
php Version ^8.1
crwlr/robots-txt Version ^1.1
crwlr/schema-org Version ^0.2|^0.3
crwlr/url Version ^2.1
psr/log Version ^2.0|^3.0
symfony/dom-crawler Version ^6.0|^7.0
symfony/css-selector Version ^6.0|^7.0
psr/simple-cache Version ^1.0|^2.0|^3.0
guzzlehttp/guzzle Version ^7.4
adbario/php-dot-notation Version ^3.1
chrome-php/chrome Version ^1.7
crwlr/utils Version ^1.2
crwlr/html-2-text Version ^0.1.0
php Version ^8.1
crwlr/robots-txt Version ^1.1
crwlr/schema-org Version ^0.2|^0.3
crwlr/url Version ^2.1
psr/log Version ^2.0|^3.0
symfony/dom-crawler Version ^6.0|^7.0
symfony/css-selector Version ^6.0|^7.0
psr/simple-cache Version ^1.0|^2.0|^3.0
guzzlehttp/guzzle Version ^7.4
adbario/php-dot-notation Version ^3.1
chrome-php/chrome Version ^1.7
crwlr/utils Version ^1.2
crwlr/html-2-text Version ^0.1.0
The package crwlr/crawler contains the following files
Loading the files please wait ....