Download the PHP package simgroep/concurrent-spider-bundle without Composer

On this page you can find all versions of the php package simgroep/concurrent-spider-bundle. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package concurrent-spider-bundle

Concurrent Spider Bundle

Build Status Coverage Status

This bundle provides a set of commands to run a distributed web page crawler. Crawled web pages are saved to Solr.

Installation

Install it with Composer:

composer require simgroep/concurrent-spider-bundle dev-master

Then add it to your AppKernel.php

new Simgroep\ConcurrentSpiderBundle\SimgroepConcurrentSpiderBundle(),

It is needed to install http://www.foolabs.com/xpdf/ - only pdftotext is realy to be functional from command line:

/path_to_command/pdftotext pdffile.pdf

Configuration

Minimal configuration is necessary. The crawler needs to know the mapping you're using in Solr so it can save documents. The only mandatory part of the config is "mapping". Other values are optional:

simgroep_concurrent_spider:
    http_user_agent: "PHP Concurrent Spider"

    rabbitmq.host: localhost
    rabbitmq.port: 5672
    rabbitmq.user: guest
    rabbitmq.password: guest

    queue.discoveredurls_queue: discovered_urls
    queue.indexer_queue: indexer

    solr.host: localhost
    solr.port: 8080
    solr.path: /solr

    mapping:
        id: #required
        title: #required
        content: #required
        url: #required
        tstamp: ~
        date: ~
        publishedDate: ~

How does it work?

You start the crawler with:

app/console simgroep:start-crawler https://github.com

This will add one job to the queue to crawl the url https://github.com. Then run the following process in background to start crawling:

app/console simgroep:crawl

It's recommended to use a tool to maintain the crawler process in background. We recommend Supervisord. You can run as many as threads as you like (and your machine can handle), but you should be careful to not flood the website. Every thread acts as a visitor on the website you're crawling.

Architecture

This bundle uses RabbitMQ to keep track of a queue that has URLs that should be indexed. Also it uses Solr to save the crawled web pages.


All versions of concurrent-spider-bundle with dependencies

PHP Build Version
Package Version
Requires php Version >=5.4.0
vdb/php-spider Version ^0.2
videlalvaro/php-amqplib Version ~2
symfony/symfony Version ^2.7||^3.0
phpoffice/phpword Version ^0.13,>=0.13.1
nelmio/solarium-bundle Version ^2.3
symfony/process Version ^3.1
predis/predis Version ^1.1
snc/redis-bundle Version ^2.0.1
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package simgroep/concurrent-spider-bundle contains the following files

Loading the files please wait ....