Download the PHP package ddliu/spider without Composer
On this page you can find all versions of the php package ddliu/spider. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package spider
Spider
A flexible spider in PHP.
Concepts
A spider contains many processors called pipes
, you can pass as many tasks as you like to the spider, each task go through these pipes
and get processed.
Installation
Requirements
- PHP5.3+
- curl(RequestPipe)
Dependencies
See composer.json
.
Usage
Find more examples in examples
folder.
Spider
The Spider
class.
Options
- limit: maxmum tasks to run
Methods
pipe($pipe)
: add a pipeaddTask($task)
: add a taskrun()
: run the spiderreport()
: write report to log
Task
A task contains the data array and some helper functions.
The Task
class implements ArrayAccess
interface, so you can access data like array.
Methods
fork($task)
: add a sub task to the spiderignore()
: ignore the task
Pipes
Pipes define how each task being processed.
A pipe can be a function:
Or extends the BasePipe:
Useful Pipes
NormalizeUrlPipe
Normalize $task['url']
.
RequestPipe
Start an HTTP request with $task['url']
and save the result in $task['content']
.
FileCachePipe
Cache a pipe (e.g. RequestPipe
).
RetryPipe
Retry on failure.
DomCrawlerPipe
Create a DomCrawler from $task['content']
. Access it with $task['$dom']
in following pipes.
ReportPipe
Report every 10 minutes.
Logging
$spider->logger
is an instance of Monolog\Logger
. You can add logging handlers to it before start:
TODO/Ideas
- Real world examples.
- Running tasks concurrently.(With pthread?)
Alternate
Use golang version for better performance!
All versions of spider with dependencies
ddliu/normurl Version ~0.1.1
monolog/monolog Version ~1.11
symfony/dom-crawler Version ~2.5
symfony/css-selector Version ~2.5
ddliu/filecache Version ~0.1
ddliu/wildcards Version 0.1.*