Download the PHP package baraveli/rss-scraper without Composer
On this page you can find all versions of the php package baraveli/rss-scraper. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download baraveli/rss-scraper
More information about baraveli/rss-scraper
Files in baraveli/rss-scraper
Package rss-scraper
Short Description Rss Scraper to scrap rss from dhivehi news sites
License MIT
Informations about the package rss-scraper
Rss scraper
Rss Scraper to scrap rss feed from news websites.
:rocket: Installation
Usage
To use this package when you install it be sure to create a config.json file inside your application and specify the sites you want to index.
:satellite: Rss Scraper Specs
This documentation decribe the rss scraper structure,usage and how the individual components work in the libary.
:crystal_ball: General Explanation
The rss scraper get the rss feed of the news from the configuration and get the rss feed items and return the data as a json response or an array.
-
:hammer: Config loader
Rss scraper configurations are stored in the configs directory as config.json
file. The config file has the information about the rss feeds that the rss scraper calls to scrap the rss feed.
Example config:
This configuration file is loading the rss feed of vaguthu.
Thats pretty much it for the configuration file. Rss scraper has a util ConfigLoader
class to load configuration data from the configs directory and return the rss feed url as an array.
The ConfigLoader class has one static load method which takes a filename
as an argument to the method as a string. filename will be the name of the json file inside the configs directory. In this case the file name will be config. If a given file is not found load method throws an execption saying "Error reading the config file or it is empty."
Config loader class is shown below:
-
:flashlight: Http Client
Client class inside the Http directory of the RSS scraper is used to send HTTP request to the RSS feed URL specified in the config to get the content. The class get method gets the content of the RSS URL and check if the returned data is a validxml content. isValidXmL()
is helper method that is provided by the helper trait. if the isvalidxml check passes the xml file is then pass to the simplexml_load_string()
function that is built into php. the returned loaded string get passed to parseXML
method to return the decoded version of the xml file to php array. The data is then returned.
This classes uses guzzle to make the http request.
Client class is shown below:
-
:page_facing_up: Article Collection
Article collection is a class that is responsible for adding everything to a collection so that the collection can easily be manipluated as a array or json. Article collection class has an item array which holds all the items. Items are added through the add method given a value. Class also have a method called jsonify() which converts the responses to json and a toArray() method that converts the response to an array. Count method lets you to count the number of item inside the item array.
Article Collection class is shown below: