Download the PHP package deravenedwriter/crawlengine without Composer

On this page you can find all versions of the php package deravenedwriter/crawlengine. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package crawlengine

Read Me

CrawlEngine

Crawl Engine is a PHP Library that helps to automate the Process of Login into password Protected Sites and Getting Needed information from them. It does this with the help of other Great Libraries Like Guzzle, DomCrawler etc. License: MIT

Table Of Contents

Installation

The Preferred way of installing CrawlEngine is with Composer as follows:

Then ensure your bootstrap file is loading the composer autoloader:

Bootstrapping The Engine Class

The Engine class is used for performing most of CrawlEngines Functions This Includes Resolving Requests, Getting Form Details from pages and more.. The Engine can be initialized as follows:

Bootstrapping The InputDetail Class

The InputDetail Instance describes an input tag of a form. An input tag of a Form that can be as follows:

The InputDetail Class is used to pass field values for a form to the Engine class and it is also what is returned when the Engine is asked to get the form inputs of a given page

It contains different properties including name, which refers to the name of the input in question, type which refers to the type of the input in question and placeholder for the placeholder of same.

We can initialize the InputDetail Class as follows:

Getting InputTag Details from a Page Containing a Form

CrawlEngine has a way of accesing websites to analyze the input tags present. Say for example, a website located as https://example.com/login has a page as shown:

we could get an array of all the inputtags contained in this page as follows:

as was earlier said, this function returns input detail of the first form found on a page. if there are more than one form for example:

The Function would only return the inputs from the first form element. if you want to return values from the second form, you would specify an additional second value to the getLoginFields function as follows

The above code would fetch form details for the second form on the page.

Resolving Requests with CrawlEngine

To make a request with Crawl Engine one needs to know somethings about the website been accessed. this includes the uri of the form used to login, the uri which the form submits to and required fields in the form. so say for example the login form for a website is located at https://example.com/login and is structured as shown:

Above is what a typical login form should look like. so from this login form we can see that the Uri where the form would be submitted to is: https://example.com/login and that we need a valid username and password to be able to login. We also see that the site generates a csrf token to validate request and this is dynamic. you dont have to bother about this field as CrawlEngnie automatically takes care of it. you also dont have to bother about any field that has been pre-filled by the server unless you wish to change it. when CrawlEngine makes it's request, it fetches the form page, records all pre-filled input values, combines them with the ones you would give it and makes the request. So from this page above we know that we just have to give CrawlEngine a valid username and password to make the request. The main function responsible for resolving requests is the resolveRequest Method of the Engine class and is used as shown:

That's all you have to do, and then CrawlEngine does all the rest of the magic. it visits the site, take your given details along with any pre-filled ones found on the site that you didn't overwrite and submits. and while logged in like a normal user, it access all the contentPagesUri and brings the entire pages back as crawler objects Lets say for example the https://example.com/dashboard page is as follows:

The resolveRequest function then returns an array of crawlers containing crawlers for each of the content pages given. so for our request above:

for more information on crawlers and how to access different values in a page, you can check out The DomCrawler Documentation

The CrawlEngine by default searches for the input field from the first form it sees on the page containing the form. If there are more than one form on the login page from which the crawl engine would access like follows:

then by default, CrawlEngine would be referencing the first form, so the csrf and other pre-filled inputs would be from the first form. If one wishes to specify that the request is for the second form, it can be done by adding an extra parameter to the resolveRequest method as follows:

The above tells CrawlEngine that you are not referring to the first form on the page but the second one.


All versions of crawlengine with dependencies

PHP Build Version
Package Version
Requires symfony/dom-crawler Version ^5.1
symfony/css-selector Version ^5.1
guzzlehttp/guzzle Version ^7.0
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package deravenedwriter/crawlengine contains the following files

Loading the files please wait ....