Download the PHP package openbuildings/spiderling without Composer

On this page you can find all versions of the php package openbuildings/spiderling. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package spiderling

Spiderling

Build Status Scrutinizer Quality Score Code Coverage Latest Stable Version

This is a library for crawling web pages with curl and PhantomJS. Heavily inspired by Capybara. It's a major component in phpunit-spiderling for integration level testing. It can handle AJAX requests easily and allows switching from fast PHP-only drivers to JavaScript-enabled like PhantomJS easily, without modifying the code.

A quick example

This will output the text content of the HTML node li.test fill in some inputs and submit the form.

The DSL

The Page object has a rich DSL for accessing content and filling in forms:

Navigation

Getters

Each node represents a HTML tag on the page, and you can use extensive getter methods to probe its contents. All of these getters are dynamic, meaning that there is no cache involved and each methods sends a call to its appropriate driver.

An example of using some of these getters, if we have this page:

Then you could write the following PHP code:

Setters

Spiderling also gives you the ability to modify the current page, filling in input fields, pressing buttons and links, submitting forms. This can be accomplished with the low level setters:

So having an example form like this:

We could do this script:

Locators

You can find elements not only by CSS selectors (which is the default) but also input elements, buttons and links have special finders. This is referred to as "locator type".

All of these locator types give you the ability to easily scan the page and select something you are looking for to click or fill without looking at the html of the page at all. Everywhere there is a selector you can enter an array('{locator type}', '{selector}') to change the default locator type.

Here's an example using the previous HTML:

The php code becomes clearer and less brittle - the underlying html can change but your code will still work as expected:

Filters

If using only locators is not enough, you can easily narrow down the search with "filters". They iterate over the found candidates, filtering out ones that don't match. Be careful with them because they load the nodes and check them one-by-one which might be performance intensive, but it is OK in most cases.

Here are the available filters:

Here is how you might use the filters with this HTML:

Finders

Most locator types have a custom method for finding an element with that particular type. There are also some other custom finders which you might find useful:

The previous form example can be rewritten like this:

Actions

Some often used actions that you can perform on the page - modifying inputs, clicking links and buttons, etc. have shortcut methods, to make your code more readable and robust.

Here are all these actions:

Using these methods you can make your code very readable. Also all of these actions return $this, allowing you to chain them easily. Consider the previous example in the Finders section - you can rewrite it like this:

A more complicated example is in order. We will be using the following HTML:

Nesting

When there are multiple elements on the page you might want to be more specific, Spiderling allows you to do this by nesting the nodes - you can call all the actions and finders from "within" a node - so that finders will search only in the children on the node.

For example:

Notice the "end()" method - this allows you to return to the previous level and continue your work from there. Also you can nest multiple times without any problem (you will have to use "end()" multiple times too to "get out of" the nesting)

Misc

There are some more additional methods as part of the DSL:

Handling AJAX

Spiderling follows the same philosophy as Capybara in that it does not explicitly support or wait for AJAX calls to finish, however each finder does not immidiately conclude failure if the element is not loaded, but waits a bit (default 2 seconds) before throwing an exception. To take advantage of that when writing your crawlers when you have an AJAX request you need to search for the change the AJAX is about to do:

For example:

Drivers

A great strength of Spiderling is the ability to use different drivers for your code. This allows switching from PHP-only curl parsing of the page to a PhantomJS without modification of the code. For example if we wanted to use a PhantomJS driver instead of the default "Simple" one then we'd need to do this:

There are 4 drivers at present:

You can easily write your own Drivers by extending the Driver class and implementing methods yourself. Some drivers do not support all the features, so it's OK to not implement every method.

Now for each driver in detail:

Driver_Simple

Loads the HTML page with curl and then parses it using PHP's native DOM and XPath. All finders are quite fast, so it's your best bet to use this if you do not rely on JavaScript or other browser specific features. It's also very easy to extend in order to make a "native" version for a specific web framework - the only thing you need to implement is the loading part, an example of which you can see with the "Driver_Kohana" class.

Before each request $_GET, $_POST and $_FILES are saved, filled in with appropriate values and later restored, mimicking a real PHP request.

Apart from loading the HTML through curl, you could set the content directly, if you've loaded it by other means.

Here's how that looks:

Generally performing post requests yourself is discouraged as they are not supported by all the drivers. But with Driver_Simple you can perform arbitrary requests, for testing API calls for example. This is accomplished directly through the driver like this:

Driver_Kohana

Uses Kohana framework's native Internal Request (slightly modifying it to trick the framework into thinking its an initial request). It extends __Driver_Simple__.

Also it handles redirects capping them to maximum 8 (configurable) and uses Request::$user_agent as its User Agent.

Example Use

Driver_Phantomjs

Using this driver you can perform all the finds and actions with PhantomJS, using a real WebKit engine with JavaScript, without the need for any graphical environment (headless). You need to have it installed in your PATH, accessaible by invoking "phantomjs".

You can download it from here: http://phantomjs.org/download.html

By default it spawns a new server on a random port from 4445 and 5000.

This should work if you have PhantomJS installed.

If you want to start the server from independently, you can modify the PhantomJS connection, you can also set it up to output messages to a log file as well as have, tweak other parameters.

Setting the "pid file" argument on start, allows the driver to save the pid of the phantomjs server process to that file, and then try to clean up the server when started again, thus making sure you don't have running PhantomJS process all over the place.

License

Copyright (c) 2012-2013, OpenBuildings Ltd. Developed by Ivan Kerin as part of clippings.com

Under BSD-3-Clause license, read LICENSE file.


All versions of spiderling with dependencies

PHP Build Version
Package Version
Requires php Version ^7.1
symfony/css-selector Version ^2.3|^3.0
openbuildings/environment-backup Version ~0.1.1
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package openbuildings/spiderling contains the following files

Loading the files please wait ....