Download the PHP package zrashwani/arachnid without Composer
On this page you can find all versions of the php package zrashwani/arachnid. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package arachnid
Arachnid Web Crawler
This library will crawl all unique internal links found on a given website up to a specified maximum page depth.
This library is using symfony/panther & FriendsOfPHP/Goutte libraries to scrap site pages and extract main SEO-related info, including:
title
, h1 elements
, h2 elements
, statusCode
, contentType
, meta description
, meta keyword
and canonicalLink
.
This library is based on the original blog post by Zeid Rashwani here:
http://zrashwani.com/simple-web-spider-php-goutte
Josh Lockhart adapted the original blog post's code (with permission) for Composer and Packagist and updated the syntax to conform with the PSR-2 coding standard.
Sponsored By
How to Install
You can install this library with Composer. Drop this into your composer.json
manifest file:
{
"require": {
"zrashwani/arachnid": "dev-master"
}
}
Then run composer install
.
Getting Started
Basic Usage:
Here's a quick demo to crawl a website:
Enabling Headless Browser mode:
Headless browser mode can be enabled, so it will use Chrome engine in the background which is useful to get contents of JavaScript-based sites.
enableHeadlessBrowserMode
method set the scraping adapter used to be PantherChromeAdapter
which is based on Symfony Panther library:
In order to use this, you need to have chrome-driver installed on your machine, you can use dbrekelmans/browser-driver-installer
to install chromedriver locally:
Advanced Usage:
Set additional options to underlying http client, by specifying array of options in constructor or creating Http client scrapper with desired options:
You can inject a PSR-3 compliant logger object to monitor crawler activity (like Monolog):
You can set crawler to visit only pages with specific criteria by specifying callback closure using filterLinks
method:
You can use LinksCollection
class to get simple statistics about the links, as following:
How to Contribute
- Fork this repository
- Create a new branch for each feature or improvement
- Apply your code changes along with corresponding unit test
- Send a pull request from each feature branch
It is very important to separate new features or improvements into separate feature branches, and to send a pull request for each branch. This allows me to review and pull in new features or improvements individually.
All pull requests must adhere to the PSR-2 standard.
System Requirements
- PHP 7.2.0+
Authors
- Josh Lockhart https://github.com/codeguy
- Zeid Rashwani http://zrashwani.com
License
MIT Public License
All versions of arachnid with dependencies
ext-spl Version *
tightenco/collect Version ^v8.34
guzzlehttp/psr7 Version ^1.4
symfony/panther Version ^1.0
fabpot/goutte Version ^4.0
psr/log Version ^1.1