Download the PHP package sobak/scrawler without Composer

On this page you can find all versions of the php package sobak/scrawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package scrawler

Scrawler

Packagist Travis build Test Coverage

Scrawler is a declarative, scriptable web robot (crawler) and scrapper which you can easily configure to parse any website and process the information into the desired format.

Configuration is based on the building blocks, for which you can provide your own implementations allowing for further customization of the process.

Install

As usual, start by installing the library with Composer:

Usage

After saving the configuration file (perhaps as a config.php) all you have to do is execute this command:

The example shown above will fetch [http://sobak.pl]() page, then it will iterate over all existing post pages (limited by first 404 occurence) starting from 2nd, get all posts on each page, map them to App\PostEntity objects and finally write the results down to individual JSON files using post slugs as filenames.

As you can see with this short code, almost half of it being the imports, you can easily achieve quite tedious task for which you would otherwise need to get a few libraries, define rules to follow, provide correct map to write down the file... Scrawler does it all for you!

Note: Scrawler does not aim to execute client side code, by design. This completely is doable (look at headless Chrome or even phantom.js if you like history) but I consider it out of scope for this project and have no interest in developing it. Thanks for understanding.

Documentation

For the detailed documentation please check the table of contents below.

If you are already familiar with the basic Scrawler concepts you will probably be mostly interested in the "Blocks" chapter. Block in Scrawler is an abstracted, swappable piece of logic defining the crawling, scrapping or result processing operations which you can customize using one of many builtin classes or even your own, tailored implementation. Looking at the example above, you could provide custom logic for UrlListProvider or ResultWriter (just examples for many of the available block types).

Note: I have to admit I am not a fan of excessive DocBlocks usage. That's why documentation in the code is sparse and focuses mainly on interfaces, especially ones for creating custom implementation of blocks. Use the documentation linked above and obviously read the code.

Just be polite

Before you start tinkering with a library, please remember: some people do not want their websites to be scrapped by bots. With growing percentage of bandwidth being caused by bots it might not only be considered problematic from the business standpoint but also expensive to handle all that traffic. Please respect that. Even though Scrawler provides implementations for some blocks, which might be useful to mimic the actual internet user, you should not use them to bypass anti-scrapping measures taken by some of the website owners.

Note: For the testing purposes you can freely crawl my website, excluding its subdomains. Just please leave the default user agent.

License

Scrawler is distributed under the MIT license. For the details please check the dedicated LICENSE file.

Contributing

For the details on how to contribute please check the dedicated CONTRIBUTING file.


All versions of scrawler with dependencies

PHP Build Version
Package Version
Requires php Version >=7.2.0
ext-curl Version *
ext-json Version *
bopoda/robots-txt-parser Version ^2.3
guzzlehttp/guzzle Version ^6.3
psr/log Version ^1.1
symfony/console Version ^4.2
symfony/css-selector Version ^4.2
symfony/dom-crawler Version ^4.2
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package sobak/scrawler contains the following files

Loading the files please wait ....