1. Go to this page and download the library: Download thewinterwind/arachnid library. Choose the download type require.
2. Extract the ZIP file and open the index.php.
3. Add this code to the index.php.
<?php
require_once('vendor/autoload.php');
/* Start to develop here. Best regards https://php-download.com/ */
thewinterwind / arachnid example snippets
$url = 'http://www.example.com';
$linkDepth = 3;
// Initiate crawl
$crawler = new \Arachnid\Crawler($url, $linkDepth);
$crawler->traverse();
// Get link data
$links = $crawler->getLinks();
print_r($links);
## Advanced Usage:
There are other options you can set to the crawler:
Set additional options to underlying guzzle client, by specifying array of options in constructor
or passing it to `setCrawlerOptions`:
//third parameter is the options used to configure guzzle client
$crawler = new \Arachnid\Crawler('http://github.com',2,
['auth'=>array('username', 'password')]);
//or using separate method `setCrawlerOptions`
$options = array(
'curl' => array(
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_SSL_VERIFYPEER => false,
),
'timeout' => 30,
'connect_timeout' => 30,
);
$crawler->setCrawlerOptions($options);
You can inject a [PSR-3][psr3] compliant logger object to monitor crawler activity (like [Monolog][monolog]):
$crawler = new \Arachnid\Crawler($url, $linkDepth); // ... initialize crawler
//set logger for crawler activity (compatible with PSR-3)
$logger = new \Monolog\Logger('crawler logger');
$logger->pushHandler(new \Monolog\Handler\StreamHandler(sys_get_temp_dir().'/crawler.log'));
$crawler->setLogger($logger);
Loading please wait ...
Before you can download the PHP files, the dependencies should be resolved. This can take some minutes. Please be patient.