Download the PHP package wenhainan/php-crawler without Composer
On this page you can find all versions of the php package wenhainan/php-crawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download wenhainan/php-crawler
More information about wenhainan/php-crawler
Files in wenhainan/php-crawler
Package php-crawler
Short Description Simple, elegant, extensible PHP Web Scraper (crawler/spider),Use the css3 dom selector,Based on phpQuery! 简洁、优雅、可扩展的PHP采集工具(爬虫),基于phpQuery。
License MIT
Homepage http://www.waytomilky.com
Informations about the package php-crawler
phpCrawler
phpCrawler
is a simple, elegant, extensible PHP Web Scraper (crawler/spider) ,based on phpQuery.
中文文档
Features
- Have the same CSS3 DOM selector as jQuery
- Have the same DOM manipulation API as jQuery
- Have a generic list crawling program
- Have a strong HTTP request suite, easy to achieve such as: simulated landing, forged browser, HTTP proxy and other complex network requests
- Have a messy code solution
- Have powerful content filtering, you can use the jQuey selector to filter content
- Has a high degree of modular design, scalability and strong
- Have an expressive API
- Has a wealth of plug-ins
Through plug-ins you can easily implement things like:
- Multithreaded crawl
- Crawl JavaScript dynamic rendering page (PhantomJS/headless WebKit)
- Image downloads to local
- Simulate browser behavior such as submitting Form forms
- Web crawler
- .....
Requirements
- PHP >= 7.1
Installation
By Composer installation:
Usage
DOM Traversal and Manipulation
-
Crawl「GitHub」all picture links
-
Crawl Google search results
- More usage
List crawl
Crawl the title and link of the Google search results list:
Results:
Encode convert
HTTP Client (GuzzleHttp)
-
Carry cookie login GitHub
-
Use the Http proxy
- Analog login
Submit forms
Login GitHub
Bind function extension
Customize the extension of a myHttp
method:
Or package to class, and then bind:
Plugin used
-
Use the PhantomJS plugin to crawl JavaScript dynamically rendered pages:
- Using the CURL multithreading plug-in, multi-threaded crawling GitHub trending :
Author
wenhainan [email protected]
Lisence
phpCrawler is licensed under the license of MIT. See the LICENSE for more details.
All versions of php-crawler with dependencies
jaeger/phpquery-single Version ^1
jaeger/g-http Version ^1.1
ext-dom Version *
tightenco/collect Version >5.0