Download the PHP package pforret/pf-article-extractor without Composer
On this page you can find all versions of the php package pforret/pf-article-extractor. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Please rate this library. Is it a good library?
Informations about the package pf-article-extractor
pforret/pf-article-extractor
Boilerplate Removal and Fulltext Extraction from HTML pages.
Rewrite of dotpack/php-boiler-pipe
for PHP8.2 and up, with tests.
Installation
Usage
Under the hood
- package accepts a full HTML page as input
- it will walk the DOM tree and try to find the main article content
- it will remove boilerplate content (like headers, footers, sidebars, ...)
- it will try to extract the main article content
- it will try to extract the title, date, images and links from the article
Rights now it's tested with example pages for
- Blogger
- Drupal
- Jekyll
- Mkdocs
- Wix
- WordPress
Similar packages
- beautifulsoup4 - Python, MIT
- html-text - Python, MIT
- kohlschutter/boilerpipe - Java, Apache 2.0
- fivefilters/readability.php - PHP, GPL-3.0
- miso-belica/jusText - Python, BSD2
- codelucas/newspaper - Python, Apache
All versions of pf-article-extractor with dependencies
PHP Build Version
Package Version
Requires
php Version
^8.2
ext-dom Version *
ext-libxml Version *
ext-mbstring Version *
fivefilters/readability.php Version ^3.2
ext-dom Version *
ext-libxml Version *
ext-mbstring Version *
fivefilters/readability.php Version ^3.2
The package pforret/pf-article-extractor contains the following files
Loading the files please wait ....