Download the PHP package reliqarts/laravel-scavenger without Composer
On this page you can find all versions of the php package reliqarts/laravel-scavenger. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package laravel-scavenger
Laravel Scavenger
The most integrated web scraping package for Laravel.
Top Features
Scavenger provides the following features and more out-the-box.
- Ease of use
- Scavenger is super-easy to configure. Simply publish the config file and set your targets.
- Scrape data from multiple sources at once.
- Convert scraped data into usable Laravel model objects.
- eg. You may scrape an article, have it converted into an object of your choice, and save it in your database. Immediately available to your viewers.
- You can easily perform one or more operations, on each property, of any scraped entity.
- eg. You may call a paraphrase service from a model or package of your choice on data attributes before saving them to your database.
- Data integrity constraints
- Scavenger uses a hashing algorithm of your choice to maintain data integrity. This hash is used to ensure that one scrap (source article) is not converted to multiple output objects (model duplicates).
- Console Command
- Once Scavenger is configured, a simple artisan command launches the seeker. Since this is a console command it is more efficient and timeouts are less likely to occur.
- Artisan command:
php artisan scavenger:seek
- Schedule ready
- Scavenger can easily be set to scrape on a schedule. Hence, creating a somewhat autonomous website is super easy!
- SERP
- Scavenger can be used to flexibly scrape Search Engine Result Pages.
Installation
-
Install via composer; in your terminal:
or require in composer.json:
then run
composer update
in your terminal to pull it in.- (Optional) Publish package resources and configuration:
You may opt to publish only configuration by using the
scavenger-config
tag:or only the migrations via the
scavenger-migrations
tag:
Configuration
Scavenger is highly configurable. Once configured, the settings will be used for every scrape.
Structure
Below is an example of a typical config file structure, with comments explaining each setting.
Target Breakdown
The targets
array contains a list of entities (to be scraped from) keyed by a unique target identifier. The structure is as follows.
model
: Laravel DB model to create from target.source
: Source URL to scrape.search
: Search settings. Use if a search is to be performed before target data is shown. (optional)keywords
: Array of keywords to search for.keyword_input
: Keyword input text markup.form_markup
: CSS selector for search form.submit_button_text
: The text on the form's submit button.
pager
: Next link CSS selector. To skip to next page.markup
: Array of attributes to scrape from main list.[attributeName => CSS selector]
__inside
: Sub markup for detail page. Markup for page which shows when article title is clicked/opened. (optional)
dissect
: Split compound attributes into smaller attributes via REGEX. (optional)preprocess
: Array of attributes which need to be preprocessed.[attributeName => callable]
(optional)remap
: Array of attributes which need to be renamed in order to be saved as target objects.[attributeName => newName]
(optional)bad_words
: Any scrap found containing these words will be discarded. (optional)
Glossary of Terms
The following words may appear in context above.
Daemon
: User instance to be used by the Scavenger service.Scrap
: Scraped data before being converted to the target object.Target
: Configured source-model mapping for a single entity.-
Target Object
: Eloquent model object to be generated from scrape.
Acknowledgements
This library is heavily inspired by, and dependent on, the Guzzle library, although several concepts may have been adjusted.
All versions of laravel-scavenger with dependencies
illuminate/support Version 6 - 10
monolog/monolog Version 1.24 - 3
fabpot/goutte Version ^4.0
ext-iconv Version *
ext-json Version *
ext-dom Version *