Download the PHP package duzun/hquery without Composer
On this page you can find all versions of the php package duzun/hquery. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Package hquery
Short Description An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+
License MIT
Homepage https://duzun.me/playground/hquery
Informations about the package hquery
hQuery.php
An extremely fast and efficient web scraper that can parse megabytes of invalid HTML in a blink of an eye.
You can use the familiar jQuery/CSS selector syntax to easily find the data you need.
In my unit tests, I demand it be at least 10 times faster than Symfony's DOMCrawler on a 3Mb HTML document. In reality, according to my humble tests, it is two-three orders of magnitude faster than DOMCrawler in some cases, especially when selecting thousands of elements, and on average uses x2 less RAM.
See tests/README.md.
💡 Features
- Very fast parsing and lookup
- Parses broken HTML
- jQuery-like style of DOM traversal
- Low memory usage
- Can handle big HTML documents (I have tested up to 20Mb, but the limit is the amount of RAM you have)
- Doesn't require cURL to be installed and automatically handles redirects (see hQuery::fromUrl())
- Caches response for multiple processing tasks
- PSR-7 friendly (see hQuery::fromHTML($message))
- PHP 5.3+
- No dependencies
🛠 Install
Just add this folder to your project and include_once 'hquery.php';
and you are ready to hQuery
.
Alternatively composer require duzun/hquery
or using npm install hquery.php
, require_once 'node_modules/hquery.php/hquery.php';
.
⚙ Usage
Basic setup:
I would recommend using php-http/cache-plugin with a PSR-7 client for better flexibility.
Load HTML from a file
hQuery::fromFile( string $filename
, boolean $use_include_path
= false, resource $context
= NULL )
Where $context
is created with stream_context_create().
For an example of using $context
to make a HTTP request with proxy see #26.
Load HTML from a string
hQuery::fromHTML( string $html
, string $url
= NULL )
Load a remote HTML document
hQuery::fromUrl( string $url
, array $headers
= NULL, array|string $body
= NULL, array $options
= NULL )
For building advanced requests (POST, parameters etc) see hQuery::http_wr(),
though I recommend using a specialized (PSR-7?) library for making requests
and hQuery::fromHTML($html, $url=NULL)
for processing results.
See Guzzle for eg.
PSR-7 example:
If you don't have cURL PHP extension,
just replace php-http/curl-client
with php-http/socket-client
in the above command.
Another option is to use stream_context_create()
to create a $context
, then call hQuery::fromFile($url, false, $context)
.
Processing the results
hQuery::find( string $sel
, array|string $attr
= NULL, hQuery\Node $ctx
= NULL )
Note: In case the charset meta attribute has a wrong value or the internal conversion fails for any other reason, hQuery
would ignore the error and continue processing with the original HTML, but would register an error message on $doc->html_errors['convert_encoding']
.
🖧 Live Demo
On DUzun.Me
A lot of people ask for sources of my Live Demo page. Here we go:
view-source:https://duzun.me/playground/hquery
🏃 Run the playground
You can easily run any of the examples/
on your local machine.
All you need is PHP installed in your system.
After you clone the repo with git clone https://github.com/duzun/hQuery.php.git
,
you have several options to start a web-server.
Option 1:
Option 2 (browser-sync):
This option starts a live-reload server and is good for playing with the code.
Option 3 (VSCode):
If you are using VSCode, simply open the project and run debugger (F5
).
🔧 TODO
- Unit tests everything
- Document everything
Cookie support(implemented in mem for redirects)Improve selectors to be able to select by attributes- Add more selectors
- Use HTTPlug internally
💖 Support my projects
I love Open Source. Whenever possible I share cool things with the world (check out NPM and GitHub).
If you like what I'm doing and this project helps you reduce time to develop, please consider to:
- ★ Star and Share the projects you like (and use)
- ☕ Give me a cup of coffee - PayPal.me/duzuns (contact at duzun.me)
- ₿ Send me some Bitcoin at this addres:
bitcoin:3MVaNQocuyRUzUNsTbmzQC8rPUQMC9qafa
(or using the QR below)