Download the PHP package unique/scraper without Composer

On this page you can find all versions of the php package unique/scraper. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package scraper

Scraper

This is a helper component, to ease the creation of custom website scrapers. It implements some basic logic of iterating the listing pages and downloading the items. In order to use this, you must first implement your own ItemListDownloader (by extending AbstractItemListDownloader) and ItemDownloader, (by extending AbstractItemDownloader or AbstractJsonItemDownloader) for your particular website.

Installation

This package requires php >= 7.4. To install the component, use composer:

Usage

In order to use this, you must first implement your own ItemListDownloader and ItemDownloader for your particular website.
Since most of scraping uses (at least my uses) consist of iterating a list and scraping items from it.
Maybe one day, as the need arises, I will expand it, but for now the scraper uses the same approch.

So, let's assume we have an ad website, that has a list of ads. The listing is divided in to however many pages and each page has 20 ads. We need to scrape all the ads.

We first create a class, that will represent our scraped Ad. It must implement SiteItemInterface.

We then implement ItemListDownloader:

Then we create a downloader for the ad itself:

Or you could extend AbstractJsonItemDownloader if ad data was fetched via json.

So that takes care of scraping. All that's left now, is to create a for example command script,
that initiates the scraping.

You can use the optional LogContainerConsole for logging stuff to the console, using two methods: stdOut() and stdErr(), which you need to implement yourself.

Documentation

Events

You can subscribe to various events triggered by the AbstractItemListDownloader, by using on( string $event_name, callable $handler ) method. Each handler will receive an EventObject, which depends on the event type:

on_list_begin

The event object will be ListBeginEvent. This is a "breakable" event (read on to find out more). Methods:

on_list_end

The event object will be ListEndEvent. Methods:

on_item_begin

The event object will be ItemBeginEvent. This is a "breakable" event (read on to find out more). Methods:

on_item_end

The event object will be ItemEndEvent. Methods:

on_item_missing_url

The event object will be ItemMissingUrlEvent. Methods:

on_break_list

The event object will be BreakListEvent. Methods:

Breakable events

These are events that implement BreakableEventInterface and can instruct the scraper to either abort processing of the item
or to abort scraping of the entire list. In php's terms, these are continue and break on while cycles.
So a breakable event object implements these methods:

More Documentation

For more details on what each and every method does, check out the source code, it should be pretty clearly documented.


All versions of scraper with dependencies

PHP Build Version
Package Version
Requires php Version >=7.4
ext-json Version *
symfony/dom-crawler Version ^5.0
unique/events Version ^1.0
guzzlehttp/guzzle Version ^7.2.0
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package unique/scraper contains the following files

Loading the files please wait ....