Download the PHP package j0k3r/graby without Composer
On this page you can find all versions of the php package j0k3r/graby. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Package graby
Short Description Graby helps you extract article content from web pages
License MIT
Informations about the package graby
Graby helps you extract article content from web pages
- it's based on php-readability
- it uses site_config to extract content from websites
- it's a fork of Full-Text RSS v3.3 from @fivefilters
Why this fork ?
Full-Text RSS works great as a standalone application. But when you need to encapsulate it in your own library it's a mess. You need this kind of ugly thing:
Also, if you want to understand how things work internally, it's really hard to read and understand. And finally, there are no tests at all.
That's why I made this fork:
- Easiest way to integrate it (using composer)
- Fully tested
- (hopefully) better to understand
- A bit more decoupled
How to use it
Note These instructions are for development version of Graby, which has an API incompatible with the stable version. Please check out the README in the
2.x
branch for usage instructions for the stable version.
Requirements
- PHP >= 7.4
- Tidy & cURL extensions enabled
Installation
Add the lib using Composer:
composer require 'j0k3r/graby dev-master' php-http/guzzle7-adapter
Why php-http/guzzle7-adapter
? Because Graby is decoupled from any HTTP client implementation, thanks to HTTPlug (see that list of client implementation).
Graby is tested & should work great with:
- Guzzle 7 (using
php-http/guzzle7-adapter
) - cURL (using
php-http/curl-client
)
Note: if you want to use Guzzle 5 or 6, use Graby 2 (support has dropped in v3 because of dependencies conflicts)
Retrieve content from an url
Use the class to retrieve content:
In case of error when fetching the url, graby won't throw an exception but will return information about the error (at least the status code):
The date
result is the same as displayed in the content. If date
is not null
in the result, we recommend you to parse it using date_parse
(this is what we are using to validate that the date is correct).
Retrieve content from a prefetched page
If you want to extract content from a page you fetched outside of Graby, you can call setContentAsPrefetched()
before calling fetchContent()
, e.g.:
Cleanup content
Since the 1.9.0 version, you can also send html content to be cleanup in the same way graby clean content retrieved from an url. The url is still needed to convert links to absolute, etc.
Use custom handler & formatter to see output log
You can use them to display graby output log to the end user. It's aim to be used in a Symfony project using Monolog.
Define the graby handler service (somewhere in a service.yml
):
Then define the Monolog handler in your app/config/config.yml
:
You can then retrieve logs from graby in your controller using:
Timeout configuration
If you need to define a timeout, you must create the Http\Client\HttpClient
manually,
configure it and inject it to Graby\Graby
.
- For Guzzle 7:
Full configuration
This is the full documented configuration and also the default one.
Credits
- FiveFilters for Full-Text-RSS
- Caneco for the awesome logo ✨
All versions of graby with dependencies
ext-curl Version *
ext-tidy Version *
fossar/htmlawed Version ^1.2.7
http-interop/http-factory-guzzle Version ^1.1
j0k3r/graby-site-config Version ^1.0.181
j0k3r/httplug-ssrf-plugin Version ^2.0
j0k3r/php-readability Version ^1.2.10
monolog/monolog Version ^1.18.0|^2.0
php-http/client-common Version ^2.7
php-http/discovery Version ^1.19
php-http/httplug Version ^2.4
php-http/message Version ^1.14
simplepie/simplepie Version ^1.7
smalot/pdfparser Version ^1.1
symfony/options-resolver Version ^3.4|^4.4|^5.3|^6.0|^7.0
true/punycode Version ^2.1
guzzlehttp/psr7 Version ^1.5.0|^2.0