Download the PHP package coooold/crawler without Composer
On this page you can find all versions of the php package coooold/crawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package crawler
PHPCrawler
Most powerful, popular and production crawling/scraping package for PHP, happy hacking :)
Features:
- server-side DOM & automatic DomParser insertion with Symfony\Component\DomCrawler
- Configurable pool size and retries
- Control rate limit
- forceUTF8 mode to let crawler deal for you with charset detection and conversion
- Compatible with PHP 7.2
Thanks to
- Amp a non-blocking concurrency framework for PHP
- Artax An Asynchronous HTTP Client for PHP
- node-crawler Most powerful, popular and production crawling/scraping package for Node
node-crawler is really a great crawler. PHPCrawler tries its best effort to keep similarity with it.
中文说明
Table of Contents
- Get started
- Install
- Basic usage
- Slow down
- Custom parameters
- Raw body
- Events
- Event: response
- Event: drain
- Advanced
- Encoding
- Logger
- Coroutine
- Other
- API reference
- Configuration
- Work with DomParser
Get started
Install
Basic usage
Slow down
Use rateLimit
to slow down when you are visiting web sites.
Custom parameters
Sometimes you have to access variables from previous request/response session, what should you do is passing parameters as same as options:
then access them in callback via $res->task['parameter1']
, $res->task['parameter2']
...
Raw body
If you are downloading files like image, pdf, word etc, you have to save the raw response body which means Crawler shouldn't convert it to string. To make it happen, you need to set encoding to null
Events
Event::RESPONSE
Triggered when a request is done.
Event::DRAIN
Triggered when queue is empty.
Advanced
Encoding
HTTP body will be converted to utf-8 from the default encoding.
Logger
A PSR logger instance could be used.
See Monolog Reference.
Coroutine
PHPCrawler, based on amp non-blocking concurrency framework, could work with coroutines, ensuring excellent performance. Amp async packages should be used in callbacks, that is to say, neither php native mysql client nor php native file io is not recommended. The keyword yield like await in ES6, introduced the non-blocking io.
Work with DomParser
Symfony\Component\DomCrawler is a handy tool for crawling pages. Response::dom will be injected with an instance of Symfony\Component\DomCrawler\Crawler.
See DomCrawler Reference.
Other
API reference
Configuration
All versions of crawler with dependencies
ext-json Version *
amphp/amp Version ^2.4
amphp/artax Version ^3.0
symfony/dom-crawler Version ^5.0
psr/log Version ^1.0.1
symfony/css-selector Version ^5.0