Download the PHP package kaishiyoku/hera-rss-crawler without Composer
On this page you can find all versions of the php package kaishiyoku/hera-rss-crawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download kaishiyoku/hera-rss-crawler
More information about kaishiyoku/hera-rss-crawler
Files in kaishiyoku/hera-rss-crawler
Package hera-rss-crawler
Short Description Modern library to handle RSS/Atom feeds
License MIT
Homepage https://github.com/kaishiyoku/hera-rss-crawler
Informations about the package hera-rss-crawler
About
This project tries to make fetching and parsing RSS feeds easier. With Hera RSS you can discover, fetch and parse RSS feeds.
Installation
- simply run
composer require kaishiyoku/hera-rss-crawler
- create a new crawler instance using
$heraRssCrawler = new HeraRssCrawler()
- discover a feed, for example
$feedUrls = $heraRssCrawler->discoverFeedUrls('https://laravel-news.com/')
- pick the feed you like to use; if there were multiple feeds discovered pick one
- fetch the feed:
$feed = $heraRssCrawler->parseFeed($feedUrls->get(0))
- fetch the articles:
$feedItems = $feed->getFeedItems()
Breaking Changes
Version 6.x
- dropped support for PHP 8.0
Version 5.x
- dropped support for PHP 7.4
Version 4.x
- dropped support for Laravel 8
Version 3.x
- FeedItem-method
jsonSerialize
has been renamed totoJson
and doesn't returnnull
anymore but throws aJsonException
if the serialized JSON is invalid.
Available crawler options
Determines how many retries parsing or discovering feeds will be made when an exception occurs, e.g. if the feed was unreachable.
Set your own logger instance, e.g. a simple file logger.
Useful for websites which redirect to another subdomain when visiting the site, e.g. for Reddit.
With that you can set your own feed discoverers.
You can even write your own, just make sure to implement the FeedDiscoverer
interface:
The default feed discoverers are as follows:
The ordering is important here because the discoverers will be called sequentially until at least one feed URL has been found and then stops.
That means that once the discoverer found a feed remaining discoverers won't be called.
If you want to mainly discover feeds by using HTML anchor elements,
the FeedDiscovererByHtmlAnchorElements
discoverer should be the first discoverer
in the collection.
Available crawler methods
Simply fetch and parse the feed of a given feed url. If no consumable RSS feed is being found null
is being returned.
Discover feeds from a website url and return all parsed feeds in a collection.
Discover feeds from a website url and return all found feed urls in a collection. There are multiple ways the crawler tries to discover feeds. The order is as follows:
- discover feed urls by content type
if the given url is already a valid feed return this url - discover feed urls by HTML head elements
find all feed urls inside a HTML document - discover feed urls by HTML anchor elements
get all anchor elements of a HTML element and return the urls of those which includerss
in its urls - discover feed urls by Feedly
fetch feed urls using the Feedly API
Fetch the favicon of the feed's website. If none is found then null
is being returned.
Check if a given url is a consumable RSS feed.
Contribution
Found any issues or have an idea to improve the crawler? Feel free to open an issue or submit a pull request.
Plans for the future
- [ ] add a Laravel facade
Author
Email: [email protected]
Website: https://andreas-wiedel.de
All versions of hera-rss-crawler with dependencies
ext-json Version *
ext-dom Version *
ext-simplexml Version *
ext-libxml Version *
symfony/dom-crawler Version ^5.4.21|^6.2.7
symfony/css-selector Version ^5.4.21|^6.2.7
guzzlehttp/guzzle Version ^7.5.0
illuminate/support Version ^9.0|^10.0|^11.0
nesbot/carbon Version ^2.66.0
laminas/laminas-xml Version ^1.5.0
laminas/laminas-feed Version ^2.20.0
monolog/monolog Version ^2.9.1|^3.3.1