Download the PHP package cacing69/cquery without Composer

On this page you can find all versions of the php package cacing69/cquery. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package cquery

Cquery (Crawl Query)

Latest Version on Packagist PRs Welcome StyleCI

About Cquery

Want to create a query for web scraping a website like this, I suppose it will enable me to generate scrape queries from anywhere.

Changelog

Please see CHANGELOG for more information what has changed recently.

Currently experimenting

Attempt to extract data from webpage which, in my opinion, becomes more enjoyable, my intention in creating this was to enable web scraping of websites that utilize js/ajax for content loading.

To perform web scraping on pages loaded with js/ajax, you need an adapter outside of this package, this was developed using symfony/panther. I don't want to add it as a default package in the core of cquery because, this feature is optional for some people. Please check and understand its usage here. I refer to it as cacing69/cquery-panther-loader. Read more information about symfony/panther, you'll discover installation and additional information there

All methods and usage instructions provided here are designed according to that i needs. If you have any suggestions or feedback to improve them, it would be highly appreciated and

I hope there's someone who is kind-hearted and compassionate to build a Web App/UI application like a dedicated tool for cquery, with a textarea (for query input) and a table container to show the results, much like the cquery playground for running raw cquery, if someone is ready, i will create an API for it, and start to develop more logic on Parser class

What kind of thing is this

Cquery is an acronym for crawl query, used to extract text from an HTML element using PHP, simply its tool for crawling/scraping web page. It called a query, as it adopts the structure present in an SQL query, so you can analogize that your DOM/HTML Document is a table you will query.

Let's play for a moment and figure out how to make website scraping easier, much like crafting a query for a database.

Please keep in mind that I haven't yet reached a beta/stable release for this library, so the available features are still very limited.

I would greatly accept any support/contribution from everyone. See CONTRIBUTING.md for help getting started.

I list a few examples of utilizing the advanced features

Quick Installation

For example, you have a simple HTML element as shown below.

Click to show HTML : src/Samples/sample.html

List definer expression available

Below are the expressions you are can use, they may change over time. function example description
attr(attrName, selector) attr(class, .link) will retrieve all class value present on the element/container according to the selector. (.link)
length(selector) length(h1) will retrieve all length string on the element/container according to the selector. (h1)
lower(selector) lower(h1) will change text to lowercase element/container according to the selector. (h1)
upper(selector) upper(h1) will change text to uppercase element/container according to the selector. (h1)
str(selector) str(h1) will parse element content to string (h1)
int(selector) int(h1) will parse element content to integer (h1)
float(selector) float(h1) will parse element content to float (h1)
reverse(selector) reverse(h1) will reverse text according to the selector. (h1)
replace(from, to, selector) replace('lorem', 'ipsum', h1) will change text from lorem to ipsum according to the selector (h1).
have 3 option to use that

replace('lorem', 'ipsum', h1)

replace(['lorem', 'dolor'], ['ipsum', 'sit'], h1)

replace(['lorem', 'ipsum'], 'ipsum', h1)

it used single tick on argument/param
append(selectorParent) append(title) as main_title will append single element as new key on element each item outsoide main source.
append_node(selectorParent, selectorChildAfterParent) append_node(div > .tags, a) as tags will append array element as a child each item, for its usage, you can refer to the sample code below in $result_4.

List rules for alias

Below are the functions you are can use, they may change over time.
Note: nested function has been supported.
# example key_result description
1 h1 h1 -
2 h1 > 1 h1_a -
3 h1 > 1 as title title -
4 append_node(div > .tags, a) as _tags.key _tags[key] it will be append element as array each element
5 append_node(div > .tags, a) as tags.*.text tags[0]['text'] * the star symbol signifies all elements at the index. it will be append new key (in this case text) each array element

How to use filter

Note: nested filter not supported yet.

operator example description
(= or ==) filter("h1", "=", "99") retrieve data according to elements that only have the same value = 99
=== filter("h1", "===", "99") retrieve data according to elements that only have the same and identic with value = 99
< filter("attr(id, a)", "<", 99) retrieve data according to elements that only have values smaller than 99
<= filter("attr(id, a)", "<=", 99) get data from elements with values that are lesser than or equal to 99
> filter("attr(id, a)", ">", 99) get data from elements with values that are greater than 99
>= filter("attr(id, a)", ">=", 99) Get data from elements with values that are greater than or equal 99
(<> or !=) filter("attr(id, a)", "!=", 99) get data from elements that are not equal to 99
!== filter("attr(id, a)", "!==", 99) get data from elements that are not equal or they are not the same type to 99
has filter("attr(class, a)", "has", "foo") get data from elements that only have class "foo"
regex filter("attr(class, a)", "regex", "/[a-z]+\-[0-9]+\-[a-z]+/im") get data from elements that match the given regex pattern only, with the provided pattern being (a-192-ab, b-12-ac, zx-1223-ac)
like filter("attr(class, a)", "like", "%foo%")

filter("attr(class, a)", "like", "%foo")

filter("attr(class, a)", "like", "foo%")
retrieve data according to elements and value criteria.

%foo% = anything containing the phrase "foo"

%foo = all sentences ending with "foo"

foo% = all sentences starting with "foo"

So, let's start scraping this website.

or u can use raw method

And here are the results

Alt text

Another example with anonymous function

Click to show output : $result_1 ![Alt text](https://gcdnb.pbrd.co/images/qtItVezcEUq7.png?o=1 "a title")
Click to show output : $result_2 ![Alt text](https://gcdnb.pbrd.co/images/qtItVezcEUq7.png?o=1 "a title")

How to load source page from url

Click to show output : $result_3 ![Alt text](https://gcdnb.pbrd.co/images/We0ea7frlZw1.png?o=1 "a title")

how to use append_node(a, b)

Click to show output : $result_4 ![Alt text](https://gcdnb.pbrd.co/images/46mETzAatjur.png?o=1 "a title")

Another example how to use append_child() with custom key each item

Click to show output : $result_5 ![Alt text](https://gcdnb.pbrd.co/images/NYUsStjIshsf.png?o=1 "a title")
Click to show output : $result_6 ![Alt text](https://gcdnb.pbrd.co/images/lXhhw7hA8LYf.png?o=1 "a title")

How to use replace

Method to manipulate query results

There are 2 methods in CQuery for manipulating query results.

  1. Each Item Closure ...->eachItem(function ($el, $i){}) or ...->eachItem(function ($el){}) Example :

Basically, you have the ability to execute any action on each item. In the given example, it will insert a new key, "price" into each item, and if the index equals 2 (third item), it will assign a price of 1000.

  1. On Obtained Results Closure ...->onObtainedResults(function ($results){}) Example :

Basically, this is the array produced by the query's result, and you have the flexibility to perform any manipulations on them. For another example i've included an example, particularly for cases where you need to load different details from another page for each entry, u can check it here Check async multiple request

How to handle multiple request each element

If there's a scenario like this, you need to load the details, and the details are on a different URL, which means you have to load every page.

You should use a client that can perform non-blocking requests, such as amphp/http-client, guzzle, phpreact/http or used curl_multi_init in oop ways for curl u should check php-curl-class

I suggest using phpreact by making async requests.

Here's a comparison when utilizing phpreact.

without phpreact

Alt text

with phpreact

Alt text

In this scenario, there are 320 rows of data, and each detail will be loaded, which means there will a lot of HTTP requests made to fetch the individual details.

How to doing action after page load (click link/submit form)

  1. Submit Form

    If you need to submit data to retrieve another data for scraping, you'll need to deal with this case.

    • case 1 : without crawler object

    Using this code above, you'll perform a form submission while setting the limit (according to input name) to 5 in the data.

    • case 2 : with crawler object

    Let's simulate on Wikipedia and then perform a search with the phrase 'sambas,' to see if the results match with a manual search.

    result

    Alt text

    web page

    Alt text

    page source

    Alt text

  2. Click Link If you want to click a link on a loaded page, please observe the code below.

Alt text

click that link before start scraping

result click link

Alt text

How to scrape website load by js/ajax with PHP

If the web page to be scraped uses JavaScript and AJAX handling for its data, then you need to add Panther-loader for cquery.

install composer-panther-loader

Another Examples

A full list of methods with example code can be found in the tests.

Note

I've recently started building this, and if anyone is interested,I would certainly appreciate a lot of feedback from everyone who has read/seen my little project, in any way (issue, pull request or whatever).However, right now I'm considering making it better to be more flexible and user-friendly for website scraping.

This is just the beginning, and I will continue to develop it as long as I can

License

The MIT License (MIT). Please see License File for more information.


All versions of cquery with dependencies

PHP Build Version
Package Version
Requires php Version ^7.2|^8.1
symfony/browser-kit Version ^5.4|^6.3
symfony/css-selector Version ^5.4|^6.3
symfony/http-client Version ^5.4|^6.3
symfony/deprecation-contracts Version ^2.5|^3.4
symfony/dom-crawler Version 5.4|^6.3
doctrine/collections Version ^1.8|^2.1|^3.0
cocur/slugify Version ^4.4
symfony/mime Version 5.4|^6.3
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package cacing69/cquery contains the following files

Loading the files please wait ....