PHP download

Download the PHP package baraja-core/webcrawler without Composer

On this page you can find all versions of the php package baraja-core/webcrawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

Table of contents
Download baraja-core/webcrawler
More information about baraja-core/webcrawler
Files in baraja-core/webcrawler

Vendor baraja-core
Package webcrawler
Short Description Simple package to load list of urls and make sitemap.
License
Homepage https://github.com/baraja-core/webcrawler

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:

If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.

Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
To use Composer is sometimes complicated. Especially for beginners.
Composer needs much resources. Sometimes they are not available on a simple webspace.
If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.

Please rate this library. Is it a good library?

Example code of baraja-core/webcrawler

Informations about the package webcrawler

BRJ organisation

Web crawler

Simply library for crawling websites by following links with minimal dependencies.

Czech documentation

📦 Installation

It's best to use Composer for installation, and you can also find the package on Packagist and GitHub.

To install, simply use the command:

You can use the package manually by creating an instance of the internal classes, or register a DIC extension to link the services directly to the Nette Framework.

How to use

Crawler can run without dependencies.

In default settings create instance and call crawl() method:

In $result variable will be entity of type CrawledResult.

Advanced checking of multiple URLs

In real case you need download multiple URLs in single domain and check if some specific URLs works.

Simple example:

Notice: File robots.txt and sitemap will be downloaded automatically if exist.

Settings

In constructor of service Crawler you can define your project specific configuration.

Simply like:

No one value is required. Please use as key-value array.

Configuration options:

Option	Default value	Possible values
`followExternalLinks`	`false`	`Bool`: Stay only in given domain?
`sleepBetweenRequests`	`1000`	`Int`: Sleep in milliseconds.
`maxHttpRequests`	`1000000`	`Int`: Crawler budget limit.
`maxCrawlTimeInSeconds`	`30`	`Int`: Stop crawling when limit is exceeded.
`allowedUrls`	`['.+']`	`String[]`: List of valid regex about allowed URL format.
`forbiddenUrls`	`['']`	`String[]`: List of valid regex about banned URL format.

📄 License

baraja-core/webcrawler is licensed under the MIT license. See the LICENSE file for more details.

All versions of webcrawler with dependencies

PHP Build Version

Package Version

Version v1.3.3 Release 01. Aug 2023
create-project require 0 people chose require and
0 people chose create-project.

Download

Download latest version of webcrawler from vendor baraja-core

Requires php Version ^8.0
ext-curl Version *
nette/utils Version ^4.0
nette/http Version ^3.0

Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package baraja-core/webcrawler contains the following files

Loading the files please wait ....