Download the PHP package mihaeu/tarantula without Composer
On this page you can find all versions of the php package mihaeu/tarantula. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download mihaeu/tarantula
More information about mihaeu/tarantula
Files in mihaeu/tarantula
Package tarantula
Short Description Another PHP crawler based on Guzzle.
License MIT
Homepage https://github.com/mihaeu/tarantula
Informations about the package tarantula
Tarantula
Tarantula is a web crawler written in PHP. It utilizes the amazing work of the people behind Guzzle and Symfony's DomCrawler.
Installation
Global tool
Make sure ~/.composer/bin
is in your $PATH
and then simply execute:
Library
Assuming you are using Composer, add the following to your composer.json
file:
or use Composer's cli tool composer require mihaeu/tarantula:1.*
.
Usage
Global tool
Right now the only command available is crawl
. Some usage examples would be:
For all arguments and options use the help
command:
Library
Have a look at the tests to see what's possible or just try the following in your code:
All HTTP requests go through Guzzle
and you can add any configuration for Guzzle
's request object also to Tarantula's HttpClient
.
Tests
Test coverage is not at 100%, the reason being that this was an afternoon project and testing a crawler takes a lot of time due to the testing setup.
If you want to get a quick overview of the project, I recommend running the test suite with the --testdox
flag:
To Do
- [ ] filters (url, filetype, etc.)
- [ ] allow for Guzzle to be configured via command line
- [ ] more actions (save plain result, crawl via DOM/XPath, ...)
Troubleshooting
Composer global install fails
This is most likely due to a conflict with some requirements of other global installs. Unfortunately Composer's architecture doesn't offer a solution for this yet. I tried to keep the requirements Tarantula loose to avoid this problem.
If you want to have Tarantula available throughout your system, just install to another directory (e.g. using composer create-project
) and symlink bin/tarantula
into a folder in your $PATH
.
Thanks to
- Symfony/SensioLabs and especially Fabien Potencier for what he does for PHP (for this particular project the DomCrawler)
- the Guzzle team for their awesome HTTP client
- Aha Soft for the logo
- the Composer team for revolutionizing the way I and many others write PHP
- GitHub for redefining collaboration
- Travis CI for improving the quality and compatibility of thousands of open source projects
- Sebastian Bergmann for PHPUnit and many other awesome QA tools
License
MIT, see LICENSE
file.
All versions of tarantula with dependencies
guzzle/guzzle Version 3.*
symfony/console Version 2.*
symfony/process Version 2.*
symfony/filesystem Version 2.*
symfony/dom-crawler Version 2.*
symfony/css-selector Version 2.*
zaininnari/html-minifier Version *