Download the PHP package rajanrx/php-scrape without Composer
On this page you can find all versions of the php package rajanrx/php-scrape. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download rajanrx/php-scrape
More information about rajanrx/php-scrape
Files in rajanrx/php-scrape
Package php-scrape
Short Description A scraping framework written in PHP
License MIT
Homepage https://github.com/rajanrx/php-scrape
Informations about the package php-scrape
PHP Scrape
A simple, easy to use, scalable scraping framework written in PHP
About PHP Scrape
Php Scrape is a basic scraping framework for PHP based on configuration first concept. i.e once implemented changes should be made on configuration file as far as possible avoiding need for code update/addition. Also, you can extend/Customize this framework to any level or use components (Extractor, Crawler) separately if you just need to use them.
Following are the key points which you can use/expect in future:
- [x] Scrape in console or browser
- [x] Use hash to escape duplicate scraping (or halt further scraping)
- [x] Generate editable configuration file using PHP code
- [x] Ability to extend own scraping components
- [ ] Add complete wiki for general and advance usage instructions
- [x] Add test coverage for command line scraping (> 80%)
- [ ] Add test coverage for Javascript scraping
- [ ] Allow use of proxy to scrape anonymously
- [ ] Generate automated integration test for scraping to ensure data integrity
Why Need For yet another git repo ?
One of the biggest problem in scraping data is the source gets changed and we have to update our codebase to get it working. As the codebase increases it is harder to maintain and even annoying looking for the place to update if someone new to codebase has to maintain it. Also different projects has their own unique requirements (made even harder by varieties/complexity of data sources) which might not be addressed by lots of libraries for not being generic enough. So in order to help facilitate developers tackle these problems, I have tried to come up with a generic, flexible solution that might help them to write easily configurable, maintainable and (extend/customize)able scraping projects.
Getting Started
The easiest way to use PHP Scrape is via Composer.
You need to create configuration file to start scraping. You can do it either by creating a config JSON file or via using php (Highly recommended as its easier to maintain and scale ) to generate one.
Once you have a configuration file you can start scraping by writing few lines of code
will return result like
As simple as that. Docs in detail will be updated soon.Meanwhile until the doc is not available please see Multi Row Extractor Test to figure out how you can scrape paginated records.
Please let me know if you have any suggestions to make this codebase better. I am happy to assist if you get stuck on your scraping project :). Feel free to ping me. Interested contributors are welcome.
Partners
BrowserStack is supporting PHP Scrape, allowing us to use their service and infrastructure to test the code in this repository. Thank you for supporting the open source community!
License
This framework is open-sourced software licensed under the MIT license.
If you are happy and want to buy me a coffee then why not :).
All versions of php-scrape with dependencies
behat/mink Version ~1.6
behat/mink-browserkit-driver Version ~1.2
behat/mink-goutte-driver Version ~1.1
behat/mink-selenium2-driver Version ~1.2
behat/mink-zombie-driver Version ~1.2
php-curl-class/php-curl-class Version 3.5.5
guzzlehttp/cache-subscriber Version 0.1.0