Download the PHP package fievel/webspider without Composer

On this page you can find all versions of the php package fievel/webspider. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package webspider

WebSpider

This repository wraps Guzzle and some Symfony components providing an easy way for spidering websites.

Requirements

Installation

Add fievel/webspider as a require dependency in your composer.json file:

composer require fievel/webspider

Usage

Extend class WebSpiderAbstract as needed implementing these methods:

getDataFromResponse: used to extract data from response, default behaviour treats body as plain text;

protected function getDataFromResponse(ResponseInterface $response)
{
    return (string) $response->getBody();
}

parseData: used to extract data information, it's possible to initialize Symfony DomCrawler if needed;

protected function parseData($data)
{
    $this->crawler->addHtmlContent($data);

    $node = $this->crawler->filter('input');

    $value = null;
    if ($node->count() > 0) {
        $value = $node->first()->attr('value');
    }

    return $value;
}

handleException: used to handle Guzzle exceptions;

protected function handleException(\Exception $e)
{
    return null;
}

The only remaining thing to do is launch the spider created, in order to do that you can use the SpiderManager service.

$manager = $this->container->get('fievel_web_spider.manager.spider');
$manager->setLogger($this->logger);

$response = null;
try {
    $response = $manager->runSpider([
        AppBundle\Spiders\CustomSpider::class,  // Spider class created
        'http://localhost/test-spider',         // URL to spidering
        'post',                                 // Http method supported by Guzzle
        ['cookies' => true],                    // Custom config supported by Guzzle Client
        [                                       // Custom options supported by Guzzle Client
            RequestOptions::FORM_PARAMS => [
                'full_name' => 'John Doe'
            ]
        ]
    ]);
} catch(\Exception $e) {
}

Features

It's possible to share a storage between subsequent spiders call.

$storage = new SpiderStorage();
$storage->add($sharedData);

$response = $manager->runSpider([
    AppBundle\Spiders\CustomSpider::class,  // Spider class created
    'http://localhost/test-spider',         // URL to spidering
    'post',                                 // Http method supported by Guzzle
    ['cookies' => true],                    // Custom config supported by Guzzle Client
    [                                       // Custom options supported by Guzzle Client
        RequestOptions::FORM_PARAMS => [
            'full_name' => 'John Doe'
        ]
    ],
    $storage                                // Shared storage
]);

It's even possible to create queues and leave the entire execution to the manager.

$queue = new SpiderCallQueue();

$queue->enqueue(
    AppBundle\Spiders\FirstPageSpider::class,
    'http://localhost/test-spider',
    'post',
    ['cookies' => true],
    [
        RequestOptions::FORM_PARAMS => [
            'full_name' => 'John Doe'
        ]
    ]
);
$queue->enqueue(
    AppBundle\Spiders\SecondPageSpider::class,
    'http://localhost/test-spider',
    'get',
    ['cookies' => true],
    []
);

$response = $manager->runSpiderQueue($queue);

Last but not least, the SpiderManager will handle retries on failure using a custom GuzzleMiddleware.

Proxy

Links


All versions of webspider with dependencies

PHP Build Version
Package Version
Requires php Version >=5.5
doctrine/orm Version >=2.2
guzzlehttp/guzzle Version ^6.0
symfony/css-selector Version >=2.7
symfony/dom-crawler Version >=2.7
symfony/framework-bundle Version >=2.7
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package fievel/webspider contains the following files

Loading the files please wait ....