Download the PHP package cerbero/lazy-json-pages without Composer
On this page you can find all versions of the php package cerbero/lazy-json-pages. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download cerbero/lazy-json-pages
More information about cerbero/lazy-json-pages
Files in cerbero/lazy-json-pages
Package lazy-json-pages
Short Description Framework-agnostic package to load items from any paginated JSON API into a Laravel lazy collection via async HTTP requests.
License MIT
Homepage https://github.com/cerbero90/lazy-json-pages
Informations about the package lazy-json-pages
๐ Lazy JSON Pages
Framework-agnostic API scraper to load items from any paginated JSON API into a Laravel lazy collection via async HTTP requests.
[!TIP] Need to read large JSON with no pagination in a memory-efficient way?
Consider using ๐ผ Lazy JSON or ๐งฉ JSON Parser instead.
๐ฆ Install
Via Composer:
๐ฎ Usage
- ๐ฃ Basics
- ๐ง Sources
- ๐๏ธ Pagination structure
- ๐ Length-aware paginations
- โช๏ธ Cursor-aware paginations
- ๐ Link header paginations
- ๐ฝ Custom paginations
- ๐ Requests optimization
- ๐ข Errors handling
- ๐ค Laravel integration
๐ฃ Basics
Depending on our coding style, we can instantiate Lazy JSON Pages in 4 different ways:
The variable $source
in our examples represents any source that points to a paginated JSON API. Once we define the source, we can then chain methods to define how the API is paginated:
When calling collect()
, we indicate that the pagination structure is defined and that we are ready to collect the paginated items within a Laravel lazy collection, where we can loop through the items one by one and apply filters and transformations in a memory-efficient way.
๐ง Sources
A source is any means that can point to a paginated JSON API. A number of sources is supported by default:
- endpoint URIs, e.g.
https://example.com/api/v1/users
or any instance ofPsr\Http\Message\UriInterface
- PSR-7 requests, i.e. any instance of
Psr\Http\Message\RequestInterface
- Laravel HTTP client requests, i.e. any instance of
Illuminate\Http\Client\Request
- Laravel HTTP client responses, i.e. any instance of
Illuminate\Http\Client\Response
- Laravel HTTP requests, i.e. any instance of
Illuminate\Http\Request
- Symfony requests, i.e. any instance of
Symfony\Component\HttpFoundation\Request
- user-defined sources, i.e. any instance of
Cerbero\LazyJsonPages\Sources\Source
Here are some examples of sources:
If none of the above sources satifies our use case, we can implement our own source.
Click here to see how to implement a custom source.
To implement a custom source, we need to extend `Source` and implement 2 methods: The parent class `Source` gives us access to 2 properties: - `$source`: the custom source for our use case - `$client`: the Guzzle HTTP client The methods to implement turn our custom source into a PSR-7 request and a PSR-7 response. Please refer to the [already existing sources](https://github.com/cerbero90/json-parser/tree/master/src/Sources) to see some implementations. Once the custom source is implemented, we can instruct Lazy JSON Pages to use it: If you find yourself implementing the same custom source in different projects, feel free to send a PR and we will consider to support your custom source by default. Thank you in advance for any contribution!๐๏ธ Pagination structure
After defining the source, we need to let Lazy JSON Pages know what the paginated API looks like.
If the API uses a query parameter different from page
to specify the current page - for example ?current_page=1
- we can chain the method pageName()
:
Otherwise, if the number of the current page is present in the URI path - for example https://example.com/users/page/1
- we can chain the method pageInPath()
:
By default the last integer in the URI path is considered the page number. However we can customize the regular expression used to capture the page number, if need be:
Some API paginations may start with a page different from 1
. If that's the case, we can define the first page by chaining the method firstPage()
:
Now that we have customized the basic structure of the API, we can describe how items are paginated depending on whether the pagination is cursor based.
๐ Length-aware paginations
The term "length-aware" indicates any pagination containing at least one of the following length information:
- the total number of pages
- the total number of items
- the number of the last page
Lazy JSON Pages only needs one of these details to work properly:
If the length information is nested in the JSON body, we can use dot-notation to indicate the level of nesting. For example, pagination.total_pages
means that the total number of pages sits in the object pagination
, under the key total_pages
.
Otherwise, if the length information is displayed in the headers, we can use the same methods to gather it by simply defining the name of the header:
APIs can expose their length information in the form of numbers (total_pages: 10
) or URIs (last_page: "https://example.com?page=10"
), Lazy JSON Pages supports both.
If the pagination works with an offset, we can configure it with the offset()
method. The value of the offset will be calculated based on the number of items present on the first page:
โช๏ธ Cursor-aware paginations
Not all paginations are length-aware, some may be built in a way where each page has a cursor pointing to the next page.
We can tackle this kind of pagination by indicating the key or the header holding the cursor:
The cursor may be a number, a string or a URI: Lazy JSON Pages supports them all.
๐ Link header paginations
Some paginated API responses include a header called Link
. An example is GitHub: if we inspect the response headers, we can see the Link
header looking like this:
To lazy-load items from a Link header pagination, we can chain the method linkHeader()
:
๐ฝ Custom paginations
Lazy JSON Pages provides several methods to extract items from the most popular pagination mechanisms. However if we need a custom solution, we can implement our own pagination.
Click here to see how to implement a custom pagination.
To implement a custom pagination, we need to extend `Pagination` and implement 1 method: The parent class `Pagination` gives us access to 3 properties: - `$source`: the [source](#-sources) pointing to the paginated JSON API - `$client`: the Guzzle HTTP client - `$config`: the configuration that we generated by chaining methods like `totalPages()` The method `getIterator()` defines the logic to extract paginated items in a memory-efficient way. Please refer to the [already existing paginations](https://github.com/cerbero90/json-parser/tree/master/src/Paginations) to see some implementations. Once the custom pagination is implemented, we can instruct Lazy JSON Pages to use it: If you find yourself implementing the same custom pagination in different projects, feel free to send a PR and we will consider to support your custom pagination by default. Thank you in advance for any contribution!๐ Requests optimization
Paginated APIs differ from each other, so Lazy JSON Pages lets us tweak our HTTP requests specifically for our use case.
By default HTTP requests are sent synchronously. If we want to send more than one request without waiting for the response, we can call the async()
method and set the number of concurrent requests:
[!NOTE]
Please note that asynchronous requests improve speed at the expense of memory, as more responses are going to be loaded at once.
Several APIs set rate limits to reduce the number of allowed requests for a period of time. We can instruct Lazy JSON Pages to respect such limits by throttling our requests:
Internally, Lazy JSON Pages uses Guzzle as its HTTP client. We can customize the client behavior by adding as many middleware as we need:
If we need a middleware to be added every time we invoke Lazy JSON Pages, we can add a global middleware:
Sometimes writing Guzzle middleware might be cumbersome. Alternatively Lazy JSON Pages provides convenient methods to fire callbacks when sending a request or receiving a response:
We can also tweak the number of allowed seconds before an API connection times out or the allowed duration of the entire HTTP request (by default they are both set to 5 seconds):
If the 3rd party API is faulty or error-prone, we can indicate how many times we want to retry failing HTTP requests and the backoff strategy to compute the milliseconds to wait before retrying (by default failing requests are repeated 3 times after an exponential backoff of 100, 400 and 900 milliseconds):
๐ข Errors handling
If something goes wrong during the scraping process, we can intercept the error and execute a custom logic to handle it:
Any exception thrown by this package extends the LazyJsonPagesException
class. This makes it easy to handle all exceptions in a single catch block:
For reference, here is a comprehensive table of all the exceptions thrown by this package:
Cerbero\LazyJsonPages\Exceptions\ |
thrown when |
---|---|
InvalidKeyException |
a JSON key does not contain a valid value |
InvalidPageInPathException |
a page cannot be found in the URI path |
InvalidPaginationException |
a pagination implementation is not valid |
OutOfAttemptsException |
an HTTP request failed too many times |
RequestNotSentException |
a JSON source didn't send any HTTP request |
UnsupportedPaginationException |
a pagination is not supported |
UnsupportedSourceException |
a JSON source is not supported |
๐ค Laravel integration
If used in a Laravel project, Lazy JSON Pages automatically fires events when:
- an HTTP request is about to be sent, by firing
Illuminate\Http\Client\Events\RequestSending
- an HTTP response is received, by firing
Illuminate\Http\Client\Events\ResponseReceived
- a connection failed, by firing
Illuminate\Http\Client\Events\ConnectionFailed
This is especially handy for debugging tools like Laravel Telescope or Spatie Ray or for triggering the related event listeners.
๐ Change log
Please see CHANGELOG for more information on what has changed recently.
๐งช Testing
๐ Contributing
Please see CODE_OF_CONDUCT for details.
๐งฏ Security
If you discover any security related issues, please email [email protected] instead of using the issue tracker.
๐ Credits
- Andrea Marco Sartori
- All Contributors
โ๏ธ License
The MIT License (MIT). Please see License File for more information.
All versions of lazy-json-pages with dependencies
cerbero/json-parser Version ^1.1
guzzlehttp/guzzle Version ^7.2
illuminate/collections Version >=8.12