Download the PHP package shel/crawler without Composer

On this page you can find all versions of the php package shel/crawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package crawler

Shel.Crawler for Neos CMS

Crawler for Neos CMS nodes and sites. It can be used to warm up the caches after a release or dump your site as html files.

Installation

Run the following command in your project

composer require shel/crawler

Usage

To crawl all pages based on a single sitemap run

To crawl all pages based on all sitemaps listed in a robots.txt file

Node based crawling

This command will try to generate all page html without using actual requests and only renders them internally. Due to the complexity of the page context, this might not give the desired results, but the resulting html of alle crawled pages can be stored for further usage.

This can be much faster as all pages are rendered in one process and all caches are reused.

To make this work, you need make provide a valid hostname.

This can be done via one of the following ways:

To crawl all sites based on their primary active domain:

To crawl all sites based on their primary active domain and use the URLs listed in robots.txt:

Experimental static file cache

By providing the outputPath you can store all crawled content as html files.

You can use this actually as a super simple static file cache by adapting your webserver configuration. There is an example for nginx:

You replace the existing try_files part with the given code and adapt the path cache if you use a different one. This cache feature is really experimental, and you are currently in charge of keeping the files up-to-date and removing old ones.

Contributing

Contributions or sponsorships are very welcome.


All versions of crawler with dependencies

PHP Build Version
Package Version
Requires neos/neos Version ^5.3 || ^7.3 || ^8.0
php Version >=7.4
chuyskywalker/rolling-curl Version ~3.1
ext-curl Version *
ext-simplexml Version *
ext-libxml Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package shel/crawler contains the following files

Loading the files please wait ....