Download the PHP package swader/diffbot-php-client without Composer

On this page you can find all versions of the php package swader/diffbot-php-client. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package diffbot-php-client

Deprecated / Being Phased Out - please prefer using API calls directly, they have been simplified for max usability!

Scrutinizer Code Quality Code Coverage Build Status Dependency Status

Diffbot PHP API Wrapper

This package is a slightly overengineered Diffbot API wrapper. It uses PSR-7 and PHP-HTTP friendly client implementations to make API calls. To learn more about Diffbot see here and their homepage. Right now it only supports Analyze, Product, Image, Discussion, Crawl, Search, and Article APIs, but can also accommodate Custom APIs. Video and Bulk API support coming soon.

Full documentation available here.

Requirements

Minimum PHP 5.6 is required. PHP 7.0 is recommended.

This package uses some non-stable packages, so you must set your project's minimum stability to something like beta or dev in composer.json:

If you don't the installation procedure below will fail.

Install

The library depends on an implementation of the client-implementation virtual package. If you don't know what this means, simply requiring the Guzzle6 adapter will do:

This adapter satisfies the requirement for client-implementation (see above) and will make it possible to install the client with:

Usage - simple

Simplest possible use case:

That's it, this is all you need to get started.

Usage - advanced

Full API reference manual in progress, but the instructions below should do for now - the library was designed with brutal UX simplicity in mind.

Setup

To begin, always create a Diffbot instance. A Diffbot instance will spawn API instances. To get your token, sign up at http://diffbot.com

Pick API

Then, pick an API.

Currently available automatic APIs are:

Video is coming soon. See below for instructions on Crawlbot, Search and Bulk API.

There is also a Custom API like this one - unless otherwise configured, they return instances of the Wildcard entity)

All APIs can also be tested on http://diffbot.com

The API you picked can be spawned through the main Diffbot instance:

API configuration

All APIs have some optional fields you can pass with parameters. For example, to extract the 'meta' values of the page alongside the normal data, call setMeta:

Some APIs have other flags that don't qualify as fields. For example, the Article API can be told to ignore Discussions (aka to not extract comments). This can speed up the fetching, because by default, it does look for them. The configuration methods all have the same format, though, so to accomplish this, just use setDiscussion:

All config methods are chainable:

For an overview of all the config fields and the values each API returns, see here.

Calling

All API instances have the call method which returns a collection of results. The collection is iterable:

In cases where only one entity is returned, like Article or Product, iterating works all the same, it just iterates through the one single element. The return data is always a collection!

However, for brevity, you can access properties directly on the collection, too.

In this case, the collection applies the property call to the first element which, coincidentally, is also the only element. If you use this approach on the image collection above, the same thing happens - but the call is only applied to the first image entity in the collection.

Just the URL, please

If you just want the final generated URL (for example, to paste into Postman Client or to test in the browser and get pure JSON), use buildUrl:

You can continue regular API usage afterwards, which makes this very useful for logging, etc.

Pure response

You can extract the pure, full Guzzle Response object from the returned data and then manipulate it as desired (maybe parsing it as JSON and processing it further on your own):

Individual entities do not have access to the response - to fetch it, always fetch from their parent collection (the object that the call() method returns).

Discussion and Post

The Discussion API returns some data about the discussion and contains another collection of Posts. A Post entity corresponds to a single review / comment / forum post, and is very similar in structure to the Article entity.

You can iterate through the posts as usual:

An Article or Product entity can contain a Discussion entity. Access it via getDiscussion on an Article or Product entity and use as usual (see above).

Custom API

Used just like all others. There are only two differences:

  1. When creating a Custom API call, you need to pass in the API name
  2. It always returns Wildcard entities which are basically just value objects containing the returned data. They have __call and __get magic methods defined so their properties remain just as accessible as the other Entities', but without autocomplete.

The following is a usage example of my own custom API for author profiles at SitePoint:

Of course, you can easily extend the basic Custom API class and make your own, as well as add your own Entities that perfectly correspond to the returned data. This will all be covered in a tutorial in the near future.

Crawlbot and Bulk API

Basic Crawlbot support has been added to the library. To find out more about Crawlbot and what, how and why it does what it does, see here. I also recommend reading the Crawlbot API docs and the Crawlbot support topics just so you can dive right in without being too confused by the code below.

In a nutshell, the Crawlbot crawls a set of seed URLs for links (even if a subdomain is passed to it as seed URL, it still looks through the entire main domain and all other subdomains it can find) and then processes all the pages it can find using the API you define (or opting for Analyze API by default).

List of all crawl / bulk jobs

A joint list of all your crawl / bulk jobs can be fetched via:

This returns a collection of all crawl and bulk jobs. Each type is represented by its own class: JobCrawl and JobBulk. It's important to note that Jobs only contain the information about the job - not the data. To get the data of a job, use the downloadUrl method to get the URL to the dataset:

Crawl jobs: Creating a Crawl Job

See inline comments for step by step explanation

Crawl jobs: Inspecting an existing Crawl Job

To get data about a job (this will be the data it was configured with - its flags - and not the results!), use the exact same approach as if creating a new one, only without the API and seeds:

Crawl jobs: Modifying an existing Crawl Job

While there is no way to alter a crawl job's configuration post creation, you can still do some operations on it.

Provided you fetched a $crawl instance as in the above section on inspecting, you can do the following:

Note that it is not necessary to issue a call() after these methods.

If you would like to extract the generated API call URL for these instant-call actions, pass in the parameter false, like so:

You can then save the URL for your convenience and call call when ready to execute (if at all).

Search API

The Search API is used to quickly search across data obtained through Bulk or Crawl API.

Use Search APIs setCol method to target a specific collection only - otherwise, all your token's collections are searched.

Testing

Just run PHPUnit in the root folder of the cloned project. Some calls do require an internet connection (see tests/Factory/EntityTest).

Adding Entity tests

I'll pay $10 for every new set of 5 Entity tests, submissions verified set per set - offer valid until I feel like there's enough use cases covered. (a.k.a. don't submit 1500 of them at once, I can't pay that in one go).

If you would like to contribute by adding Entity tests, I suggest following this procedure:

  1. Pick an API you would like to contribute a test for. E.g., Product API.
  2. In a scratchpad like index.php, build the URL:

  3. Grab the URL and paste it into a REST client like Postman or into your browser. You'll get Diffbot's response back. Keep it open for reference.
  4. Download this response into a JSON file. Preferably into tests/Mocks/Products/[date]/somefilename.json, like the other tests are. This is easily accomplished by executing curl "[url] > somefilename.json" in the Terminal/Command Line.
  5. Go into the appropriate tests folder. In this case, tests/Entity and open ProductTest.php. Notice how the file is added into the batch of files to be tested against. Every provider has it referenced, along with the value the method being tested should produce. Slowly go through every test method and add your file. Use the values in the JSON you got in step 3 to get the values.
  6. Run phpunit tests/Entity/ProductTest.php to test just this file (much faster than entire suite). If OK, send PR :)

If you'd like to create your own Test classes, too, that's fine, no need to extend the ones that are included with the project. Apply the whole process just rather than extending the existing ProductTest class make a new one.

Adding other tests

Other tests don't have specific instructions, contribute as you see fit. Just try to minimize actual remote calls - we're not testing the API itself (a.k.a. Diffbot), we're testing this library. If the library parses values accurately from an inaccurate API response because, for example, Diffbot is currently bugged, that's fine - the library works!

Contributing

Please see TODO for ideas.

Credits

License

The MIT License (MIT). Please see License File for more information.


All versions of diffbot-php-client with dependencies

PHP Build Version
Package Version
Requires php Version ^5.6|7.*
php-http/client-implementation Version ^1.0
php-http/client-common Version ^1
php-http/discovery Version ^1
php-http/message Version ^1.0
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package swader/diffbot-php-client contains the following files

Loading the files please wait ....