Download the PHP package michaeluno/php-simple-web-scraper without Composer
On this page you can find all versions of the php package michaeluno/php-simple-web-scraper. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download michaeluno/php-simple-web-scraper
More information about michaeluno/php-simple-web-scraper
Files in michaeluno/php-simple-web-scraper
Package php-simple-web-scraper
Short Description A PHP application which runs on Heroku and dumps web site outputs including JavaScript generated contents.
License MIT
Informations about the package php-simple-web-scraper
PHP Simple Web Scraper
A PHP application for Heroku, which can dump web site outputs including JavaScript generated contents.
Demo
Visit here. If the server is sleeping, it takes several seconds to wake up.
Usage
Basic Usage
Perform an HTTP request with the url
query parameter and encoded URL as a value.
Example
Parameters
output
Determines the output type, which includes html
, json
, screenshot
.
html (default)
HTML source code of the target web site. JavaScript generated contents are also retrieved and dumped.
json
output=json
HTTP response data as JSON. Useful for cross domain communications with JSONP.
Example
screenshot
output=screenshot
A jpeg image of the site snapshot.
Example
file-type
When screenshot
is given for the output
parameter, the output file type can be set with the file-type
parameter. Default: jpg
.
It accepts the following values: pdf
, png
, jpg
, jpeg
, bmp
, ppm
.
width
When screenshot
is given for the output
parameter, width
sets the screenshot image width.
height
When screenshot
is given for the output
parameter, height
sets the screenshot image height. Leave it unset to get full height. The default minimum height is 720
pixels.
Example
user-agent
Sets a custom user agent. By default, the client's user agent accessing the app will be used. This can be changed by specifying the value with this parameter.
If random
is given, the user-agent will be randomly assigned.
Example
To set a user agent, Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100102 Firefox/57.0
,
load-images
Decides whether to load images. By default, this is disabled for the html
and json
output types. Enabled for the screenshot
output type.
Accepts a boolean value true
, false
, or 1
, 0
.
Example
output-encoding
Sets the encoding used for the output. Default: utf8
cache-lifespan
All requests are cached for 20 minutes by default. This detemines how long the cache should be retained. If you do not want a cached result or want to renew the cache, pass 0
. Default: 1200
.
headers
Sets a custom HTTP headers. Accepts the value as an array.
Example
To set DNT
value,
method
HTTP request method. Default: GET
. Accepts the followings.
- OPTIONS
- GET
- HEAD
- POST
- PUT
- DELETE
- PATCH
When using POST
, give sending post data with the data
request key. The program checks $_REQUEST[ 'data' ]
to send POST data.
Example
Run as Heroku Application
This is a Heroku application and meant to be deployed to a Heroku application instance.
Requirements
- Heroku account
- Heroku CLI
- Git
Steps to Deploy
a) Quick Deploy
You may simply use the following button to deploy this application:
b) Manual Deploy
- Clone this repository to your local machine. Create a directory and from there, in a console window, type the following.
This will download the repository files.
-
Change the working directory to the cloned one.
-
Login to Heroku from Heroku CLI.
- Create a new Heroku app.
This gives somehing like this with a random app name. glacial-basin-46381
is the app name in the below example.
-
Type the following. Replace
{heroku-app-name}
with your app name given in the above step. -
Upload the files to Heroku.
- Open the app in your browser.
All versions of php-simple-web-scraper with dependencies
ext-mbstring Version *
ext-json Version *
jakoch/phantomjs-installer Version 3.0.0 as 2.1.1-p08
jonnyw/php-phantomjs Version 4.6.1
michaeluno/php-classmap-generator Version 1.*