Download the PHP package restyler/scrapeninja-api-php-client without Composer
On this page you can find all versions of the php package restyler/scrapeninja-api-php-client. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download restyler/scrapeninja-api-php-client
More information about restyler/scrapeninja-api-php-client
Files in restyler/scrapeninja-api-php-client
Package scrapeninja-api-php-client
Short Description Web scraper API with proxy rotation, retries, and with Chrome TLS fingerprint emulation
License MIT
Informations about the package scrapeninja-api-php-client
ScrapeNinja Web scraper PHP API Client
This library is a thin Guzzle-based wrapper around ScrapeNinja Web Scraping API.
What is ScrapeNinja?
Simple & high performance web scraping API which
- has 2 modes of websites rendering:
scrape()
: fast, which emulates Chrome TLS fingerprint without Puppeteer/Playwright overheadscrapeJs()
: full fledged real Chrome with Javascript rendering and basic interaction (clicking, filling in forms).
- is backed by rotating proxies (geos: US, EU, Brazil, France, Germany, 4g residential proxies available, your own proxy can be specified as well upon request).
- has smart retries and timeouts working out of the box
- allows to extract arbitrary data from raw HTML without dealing with PHP HTML parsing libraries: just pass
extractor
function, written in JavaScript, and it will be executed on ScrapeNinja servers. ScrapeNinja uses Cheerio which is a jQuery-like library to extract data from HTML, you can quickly build & test your extractor function in Live Cheerio Sandbox, see/examples/extractor.php
for an extractor which gets pure data from HackerNews HTML source.
ScrapeNinja Full API Documentation
https://rapidapi.com/restyler/api/scrapeninja
ScrapeNinja Live Sandbox
ScrapeNinja allows you to quickly create and test your web scraper in browser: https://scrapeninja.net/scraper-sandbox
Use cases
The popular use case of ScrapeNinja is when regular Guzzle/cURL fails to get the scraped website response reliably, even with headers fully identical to real browser, and gets 403 or 5xx errors instead.
Another major use case is when you want to avoid Puppeteer setup and maintenance but you still need real Javascript rendering instead of sending raw network requests.
ScrapeNinja helps to reduce the amount of code for retrieving HTTP responses and dealing with retries, proxy handling, and timeouts.
Read more about ScrapeNinja:
https://pixeljets.com/blog/bypass-cloudflare/ https://scrapeninja.net
Get your free access key here:
https://rapidapi.com/restyler/api/scrapeninja
See /examples folder for examples
Installation
Examples:
/examples
folder of this repo contains quick ready-to-launch examples how ScrapeNinja can be used.
To execute these examples in a terminal, retrieve your API key and then set it as environment variable:
Basic scrape request
Get full HTML rendered by real browser (Puppeteer) in PHP:
Extract data from raw HTML:
Response will contain PHP array with pure data:
Sending POST requests
ScrapeNinja can perform POST requests.
Sending JSON POST
Sending www-encoded POST
Retries logic
ScrapeNinja retries the request 2 times (so 3 requests in total) by default, in case of failure (target website timeout, proxy timeout, certain provider captcha request). This behaviour can be modified and disabled.
ScrapeNinja can also be instructed to retry on http response status codes and text existing in response body (useful for custom captchas)
Error handling
You should definitely wrap scrape() calls into try catch handler and log your errors. RapidAPI might get down, ScrapeNinja server might get down, target website might get down.
- In case RapidAPI or ScrapeNinja are down, you will get Guzzle exception which treats any non-200 response from ScrapeNinja server as an unusual situation (which is good). You might get 429 error if you exceed your plan limit.
- In case ScrapeNinja failed to get "good" response even after 3 retries it might throw 503 error.
In all these cases, it is useful to get HTTP response of a failure.
(see examples/ folder for full error handling example)