Download the PHP package daa/web-scraping-sdk without Composer
On this page you can find all versions of the php package daa/web-scraping-sdk. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download daa/web-scraping-sdk
More information about daa/web-scraping-sdk
Files in daa/web-scraping-sdk
Package web-scraping-sdk
Short Description Composer package that simplifies web scraping
License MIT
Informations about the package web-scraping-sdk
Web Scraping PHP SDK
This is a composer package that simplifies web content scraping providing a lightweight and easy to use code base.
Simply extend the Scraper class provided and implement the gather() method to extract the desired content using xpaths. You can then output this content to a file, store in a database, return a json string, etc.
Highlights:
- XPath driven extraction of content
- Just one method to implement
- Allows easy file writing, database storage or formatted string/object return
- PSR2 coding standards
- Uses cURL to retrieve content from specified source
- Configurable failed attempts retry count and pause time
- Easily follow links to get additional content
Packagist link: https://packagist.org/packages/daa/web-scraping-sdk
Usage
Add the following requirement to your composer file and do a composer install/update:
Write your own scraper class which extends Scraper\Sdk\WebScraper and implements the gather method:
Now call your class, for example from a script that is executed by a cron job:
With troublesome sources you can specify the retry configuration (default is 3 retries with a 3 second pause in between)
You can use the same instance to scrape several urls with the same structure:
Check out the examples folder for more details and fully working examples.