Download the PHP package marioungui/php-component-spider without Composer
On this page you can find all versions of the php package marioungui/php-component-spider. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download marioungui/php-component-spider
More information about marioungui/php-component-spider
Files in marioungui/php-component-spider
Package php-component-spider
Short Description a PHP package for scraping Brands Websites
License MIT
Informations about the package php-component-spider
PHP Component Spider
This PHP Component Spider is designed to scrape websites for specific components or search criteria defined by XPath filters. It uses the PHPScraper library to fetch and process web pages, and the League\Csv library to log the results in CSV files. This tool is easy to extend with custom XPath filters to meet various scraping needs.
Features
- Scrape websites for specific components or text based on XPath filters.
- Log results into CSV files for further analysis.
- Configurable timeout and maximum redirects.
- Easy to extend with additional filters.
Requirements
- PHP 8.1 or higher
- Composer
Build & Run from Source Code
-
Clone the repository:
-
Navigate to the project directory:
-
Install the dependencies using Composer:
-
Build the Phar package:
- Run the batch spider.bat
- Follow the on-screen instructions to select the component to search for and the domain to scrape.
Filters
The filters are defined in filters.php and use XPath to identify specific components on the web pages. Here are the current filters available:
Component | Index | Filter |
---|---|---|
MVP Block | 1 | //*[@class='mvp-block'] |
Smart Question Search Engine Block | 2 | //*[@class='sqe-block'] |
Related Articles Block | 3 | //h2[text()='Artigos relacionados' or text()='Artigos Relacionados' or text()='Articulos Relacionados' or text()='Articulos relacionados' ] |
Related Products Block | 4 | //h2[text()='Produtos Relacionados' or text()='Produtos Relacionados' or text()='Productos relacionados' or text()='Productos Relacionados'] |
Brands Block | 5 | //*[starts-with(@id, 'brands_block')]/@id |
Stages Block | 6 | //*[starts-with(@id, 'stages_block')] |
String Search | 7 | //*[contains(text(),'word')] |
Action Bar | 8 | //div[contains(@class, 'action-bar__wrapper')] |
Links Containing | 9 | //a[contains(@href, 'word')] |
Stages Block using From Library | 10 | //div[contains(@class, 'paragraph--type--stages-block')]//div[contains(@class, 'grid-col-10')] |
Extending with Custom Filters
Extending the tool with new filters is simple:
- Open the
filters.php
file. - Add a new
case
in theswitch
statement with your component name or index. - Define the
$component
and$filter
variables with your custom XPath.
Example:
Contributing
Feel free to submit issues or pull requests if you have any improvements or new features you'd like to add.
License
This project is licensed under the MIT License.
All versions of php-component-spider with dependencies
spekulatius/phpscraper Version ^2.0
symfony/http-kernel Version ^5.4
league/csv Version ^9.8
symfony/browser-kit Version ^6.4