Download the PHP package agentsquidflaps/web-scraper without Composer
On this page you can find all versions of the php package agentsquidflaps/web-scraper. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download agentsquidflaps/web-scraper
More information about agentsquidflaps/web-scraper
Files in agentsquidflaps/web-scraper
Package web-scraper
Short Description Scrape website sitemaps for HTML elements
License MIT
Homepage https://github.com/agentsquidflaps/web-scraper
Informations about the package web-scraper
Getting started
Install
composer install agentsquidflaps/web-scraper
Requirements
- PHP 7.2 or greater
- ext-json
- ext-simplexml
- symfony/dom-crawler 4 or greater
- symfony/css-selector 4 or greater
Documentation
Please see below for basic usage or you can go to https://agentsquidflaps.github.io/web-scraper/#/ for more information.
Usage
Basic usage...
(new WebScraper())->setSitemaps([
'https://www.yoursite.com/sitemap.xml'
])->getData()
...this will simply output the HTML for all pages in your sitemap in a JSON format.
You can also target specific elements on a page...
(new WebScraper())->setSitemaps([
'https://www.yoursite.com/sitemap.xml'
])
->setElements([
'.btn',
'table'
])
->getData()
...and instead of returning the whole page, it'll return elements in a page that match the criteria of the elements provided.
Save file
You can also save the data to a file. To do so just...
(new WebScraper())->setSitemaps([
'https://www.yoursite.com/sitemap.xml'
])
->setFileLocation('somewhere.json')
->saveData()
Formats
You can also output the data in different formats. Supported formats are currently JSON, Array and CSV.
(new WebScraper())->setSitemaps([
'https://www.yoursite.com/sitemap.xml'
])
->setFileLocation('somewhere.csv')
->setFormat(WebScraper::FORMAT_CSV)
->saveData()
Disabling verify peer
You don't have to verify peer when grabbing URLs to scrape (although, highly recommended). This can be useful if the URLs provided in the sitemap have sketchy or non-existent SSLs.
(new WebScraper())->setSitemaps([
'http://www.yoursite.com/sitemap.xml'
])
->setFileLocation('somewhere.csv')
->setFormat(WebScraper::FORMAT_CSV)
->setVerifyPeerEnabled(false)
->saveData()
All versions of web-scraper with dependencies
ext-json Version *
ext-simplexml Version *
symfony/dom-crawler Version ^4.0
symfony/css-selector Version ^4.0