Download the PHP package scotteh/php-goose without Composer
On this page you can find all versions of the php package scotteh/php-goose. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download scotteh/php-goose
More information about scotteh/php-goose
Files in scotteh/php-goose
Package php-goose
Short Description Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
License Apache-2.0
Homepage https://github.com/scotteh/php-goose
Informations about the package php-goose
PHP Goose - Article Extractor
Note
This repository has been archived as of 2023-09-05.
Intro
PHP Goose is a port of Goose originally developed in Java and converted to Scala by GravityLabs. Portions have also been ported from the Python port python-goose. Its mission is to take any news article or article type web page and not only extract what is the main body of the article but also all metadata and most probable image candidate.
The extraction goal is to try and get the purest extraction from the beginning of the article for servicing flipboard/pulse type applications that need to show the first snippet of a web article along with an image.
Goose will try to extract the following information:
- Main text of an article
- Main image of article
- Any YouTube/Vimeo movies embedded in article
- Meta Description
- Meta tags
- Publish Date
The PHP version was rewritten by:
- Andrew Scott
Requirement
- PHP 7.1 or later
- PSR-4 compatible autoloader
The older 0.x versions with PHP 5.5+ support are still available under releases.
Install
This library is designed to be installed via Composer.
Add the dependency into your projects composer.json.
Download the composer.phar
Install the library.
Autoloading
This library requires an autoloader, if you aren't already using one you can include Composers autoloader.
Usage
Configuration
All config options are not required and are optional. Default (fallback) values have been used below.
Licensing
PHP Goose is licensed by Gravity.com under the Apache 2.0 license, see the LICENSE file for more details.
All versions of php-goose with dependencies
ext-mbstring Version *
ext-libxml Version *
lib-libxml Version >=2.7.7
guzzlehttp/guzzle Version ^6.0|^7.0
jakeasmith/http_build_url Version 1.0.*
scotteh/php-dom-wrapper Version ^2.0