Download the PHP package caseyamcl/phpoaipmh without Composer
On this page you can find all versions of the php package caseyamcl/phpoaipmh. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download caseyamcl/phpoaipmh
More information about caseyamcl/phpoaipmh
Files in caseyamcl/phpoaipmh
Package phpoaipmh
Short Description A PHP OAI-PMH 2.0 Harvester library
License MIT
Homepage https://github.com/caseyamcl/phpoaipmh
Informations about the package phpoaipmh
PHPOAIPMH
A PHP OAI-PMH harvester client library
This library provides an interface to harvest OAI-PMH metadata from any OAI 2.0 compliant endpoint.
Features:
- PSR-12 Compliant
- Composer-compatible
- Unit-tested
- Prefers Guzzle (v6, v7, or v5) for HTTP transport layer, but can fall back to cURL, or implement your own
- Easy-to-use iterator that hides all the HTTP junk necessary to get paginated records
Installation Options
Install via Composer by including the following in your composer.json file:
{
"require": {
"caseyamcl/phpoaipmh": "^3.0",
"guzzlehttp/guzzle": "^7.0"
}
}
Or, drop the src
folder into your application and use a PSR-4 autoloader to include the files.
Note: Guzzle v6.0 or v7.0 is recommended, but if you do not wish to use Guzzle v6 for whatever reason, you can use any one of the following:
- Guzzle 5.0 - You can use Guzzle v5 instead of v6.
- cURL - This library will fall back to using cURL if Guzzle is not installed.
- Build your own - You can use a different HTTP client library by passing your own
implementation of the
Phpoaipmh\HttpAdapter\HttpAdapterInterface
to thePhpoaipmh\Client
constructor.
Upgrading
There are several backwards-incompatible API improvements in major version changes. See
Usage
Setup a new endpoint client:
Get basic information:
Retrieving records
Limiting record retrieval by date/time
Simply pass instances of DateTimeInterface
to Endpoint::listRecords()
or Endpoint::listIdentifiers()
as
arguments two and three, respectively.
If you want one and not another, you can pass null
for either argument.
Setting date/time granularity
This library will attempt to retrieve granularity automatically from the OAI-PMH
Identify
endpoint, but in case you want to set it your self manually, you can pass
an instance of Granularity
to the Endpoint
constructor:
Record sets
Some OAI-PMH endpoints sub-divide records into sets.
You can list the record sets available for a given endpoint by calling Endpoint::listSets()
:
You can specify the set you wish to retrieve by passing the set name as the fourth argument to
Endpoint::listIdentifiers()
or Endpoint::listRecords()
:
Getting total record count
Some endpoints provide a total record count for your query. If the endpoint
provides this, you can access this value by calling: RecordIterator::getTotalRecordCount()
.
If the endpoint does not provide this count, then RecordIterator::getTotalRecordCount()
returns null
.
Handling Results
Depending on the verb you use, the library will send back either a SimpleXMLELement
or an iterator containing SimpleXMLElement
objects.
- For
identify
andgetRecord
, aSimpleXMLElement
object is returned - For
listMetadataFormats
,listSets
,listIdentifiers
, andlistRecords
aPhpoaipmh\ResponseIterator
is returned
The Phpoaipmh\ResponseIterator
object encapsulates the logic to iterate through paginated sets of records.
Handling Errors
This library will throw different exceptions under different circumstances:
- HTTP request errors will generate a
Phpoaipmh\Exception\HttpException
- Response body parsing issues (e.g. invalid XML) will generate a
Phpoaipmh\Exception\MalformedResponseException
- OAI-PMH protocol errors (e.g. invalid verb or missing params) will generate a
Phpoaipmh\Exception\OaipmhException
All exceptions extend the Phpoaipmh\Exception\BaseoaipmhException
class.
Customizing Default Request Options
You can customize the default request options (for example, request timeout) for both cURL and Guzzle clients by building the adapter objects manually.
If you're using Guzzle v6, you can set default options by building your own Guzzle client and setting parameters in the constructor:
If you're using cURL, you can set request options by passing them in as an
array of key/value items to CurlAdapter::setCurlOpts()
:
If you're using Guzzle v5, you can set default options by building your own Guzzle client,
Dealing with XML Namespaces
Many OAI-PMH XML documents make use of XML Namespaces. For non-XML experts, it can be confusing to implement these in PHP. SitePoint has a brief but excellent overview of how to use Namespaces in SimpleXML.
Iterator Metadata
The Phpoaipmh\RecordIterator
iterator contains some helper methods:
getNumRequests()
- Returns the number of HTTP requests made thus fargetNumRetrieved()
- Returns the number of individual records retrievedreset()
- Resets the iterator, which will restart the record retrieval from scratch.
Handling 503 Retry-After
Responses
Some OAI-PMH endpoints employ rate-limiting so that you can only make X number
of requests in a given time period. These endpoints will return a 503 Retry-AFter
HTTP status code if your code generates too many HTTP requests too quickly.
Guzzle v6
If you have installed Guzzle v6, then you can use the Guzzle-Retry-Middleware library to automatically handle OAI-PMH endpoint rate limiting rules.
First, include the middleware as a dependency in your app:
Then, when loading the Phpoaipmh libraries, build a Guzzle client manually, and add the middleware to the stack. Example:
This will create a client that automatically retries requests when OAI-PMH endpoints send
503
rate-limiting responses.
The Retry middleware contains a number of options. Refer to the README for that package for details.
Guzzle v5
If you have installed Guzzle v5, then you can use the Retry-Subscriber to automatically handle OAI-PMH endpoint rate-limiting rules.
First, include the retry-subscriber as a dependency in your composer.json
:
require: {
/* ... */
"guzzlehttp/retry-subscriber": "~2.0"
}
Then, when loading the Phpoaipmh libraries, instantiate the Guzzle adapter manually, and add the subscriber as indicated in the code below:
This will create a client that automatically retries requests when OAI-PMH endpoints send
503
rate-limiting responses.
Sending Arbitrary Query Parameters
If you wish to send arbitrary HTTP query parameters with your requests, you can
send them via the \Phpoaipmh\Client
class:
$client = new \Phpoaipmh\Client('http://some.service.com/oai');
$client->request('Identify', ['some' => 'extra-param']);
Alternatively, if you wish to send arbitrary parameters while taking advantage of the
convenience of the \Phpoaipmh\Endpoint
class, you can use the Guzzle Param Middleware
library:
First, include the middleware as a dependency in your app:
Then, when loading the Phpoaipmh libraries, build a Guzzle client manually, and add the middleware to the stack. Example:
This will add the specified query parameters to all requests for the client.
Sending arbitrary query parameters with Guzzle v5
If you are using Guzzle v5, you can use the Guzzle event system:
Implementation Tips
Harvesting data from a OAI-PMH endpoint can be a time-consuming task, especially when there are lots of records. Typically, this kind of task is done via a CLI script or background process that can run for a long time. It is not normally a good idea to make it part of a web request.
Credits
- Casey McLaughlin
- Christian Scheb
- Matthias Vandermaesen
- Sean Blommaert
- Valery Buchinsky
- All Contributors
License
MIT License; see LICENSE file for details
All versions of phpoaipmh with dependencies
ext-simplexml Version *