Download the PHP package zrashwani/news-scrapper without Composer
On this page you can find all versions of the php package zrashwani/news-scrapper. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package news-scrapper
News Scrapper
This library extract article/news information from a webpage including: title, main image, description, author, keywords, publish date and body (if possible)...
This library supports scrapping using standard structured meta data, like: Microdata, hAtom Microformat ..etc, along with custom selectors that can be specified to support unstructured webpages.
News-Scrapper requires PHP >= 5.4
How to Install
You can install this library with Composer. Drop this into your composer.json
manifest file:
{
"require": {
"zrashwani/news-scrapper": "1.*"
}
}
Then run composer install
.
How to Use
Here's a quick how to scrap news data from a webpage:
By default, scrapper tries to guess the best structured data adapter and apply it.
Scrapping Structured data
You can select a specific adapter to be used for extracting the data as following:
Here is the list of supported structured data adapters or scrapping modes:
Scrapping Unstructured data
If the webpage doesn't follow any standard structured data, you can still scrap news information by specifying xpath or css selector for different article parts like: title, description, image and body. as following:
Custom scrapping adapter CustomAdapter
supports method chaining for setting the selectors.
If any selector is not specified it will use default selectors based on DefaultAdapter
(which is html adapter that depends of standard meta tags).
Scrapping Group of Links
To scrap group of news article from certain page containing news links, scrapLinkGroup
method can be used
How to Contribute
- Fork this repository
- Create a new branch for each feature or improvement
- Send a pull request from each feature branch
It is very important to separate new features or improvements into separate feature branches, and to send a pull request for each branch. This allows me to review and pull in new features or improvements individually.
All pull requests must adhere to the PSR-2 standard.
System Requirements
- PHP 5.4.0+
License
MIT Public License