Download the PHP package rgasch/autoscraper without Composer
On this page you can find all versions of the php package rgasch/autoscraper. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download rgasch/autoscraper
More information about rgasch/autoscraper
Files in rgasch/autoscraper
Package autoscraper
Short Description PHP port of the Python autoscraper library.
License MIT
Homepage https://github.com/rgasch/autoscraper
Informations about the package autoscraper
AutoScraper
- [ ] **_AutoScraper is a PHP class designed to scrape web pages and extract data based on predefined rules.
- [ ] This README provides examples of how to use the class to capture a scraping definition and then reuse
- [ ] this definition to scrape other similar pages._**
AutoScraper is a port of the Python AutoScraper library by Alireza Mika. It is intended to be compatible in its public API, but it contains some additions and changes to better fit the PHP ecosystem.
Installation
To install the AutoScraper class, use Composer:
Usage
Capturing a Scraping Definition To capture a scraping definition, you need to provide a URL and a wishlist of items you want to scrape from the page
Reusing the Scraping Definition
Once you have captured and saved a scraping definition, you can reuse it to scrape other similar pages.
Methods
Captures a scraping definition based on the provided URL and wishlist.
- $url: The URL of the page to scrape.
- $wishlist: An array of items you want to scrape from the page. Usually a single item suffices.
Saves the captured scraping definition to a file.
- $filePath: The path to save the definition file.
`
Loads a previously saved scraping definition from a file.
- $filePath: The path to the definition file.
`
Scrapes a page using the loaded scraping definition and returns the extracted data.
- $url: The URL of the page to scrape.
Returns the CSS selector after you have loaded a previously saved scraping definition.
Test Commands
There are two tests commands that you can refer to in order to see actual use cases and to interactively test the AutoScraper class. These commands are:
This file prompts you for a URL and the text you wish to scrape and then saves the resulting CSS selector definitions into a JSON file into the resource directory.
This file allows you to re-use a previously saved CSS selector definition to scrape a new URL.
Tutorials
Refer to this gist for some advanced use cases and tutorials on how to use the AutoScraper class. This gist is (of course) based on the Python library, but it should illustrate how to use the PHP version as well.
Disclaimer: I have written some tests to verify the correctness of the PHP Library, but certainly haven't covered all areas of the functionality. It should work, but no guarantees are given. Besides, this is open source, so you know what that means (hint: pull requests are welcome).
License
This project is licensed under the MIT License.
All versions of autoscraper with dependencies
symfony/browser-kit Version ^7.2
symfony/dom-crawler Version ^7.2
symfony/http-client Version ^7.2
thecodingmachine/safe Version ^3.0