Download the PHP package rajpurohithitesh/advance-phpscraper without Composer

On this page you can find all versions of the php package rajpurohithitesh/advance-phpscraper. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package advance-phpscraper

Advance PHP Scraper

Advance PHP Scraper is a powerful, modular, and extensible PHP library designed for web scraping. It simplifies extracting data from websites, such as links, images, meta tags, structured data, and more, while offering advanced features like plugin support, rate limiting, and asynchronous scraping. Whether you're a beginner or an experienced developer, this library provides a flexible and user-friendly interface to scrape web content efficiently.

This document is crafted to be beginner-friendly, with detailed explanations and examples to help you get started, even if you're new to PHP or web scraping. By the end, you'll know how to install, use, and extend the library with ease.


Table of Contents

  1. What is Advance PHP Scraper?
    • Why Use This Library?
    • Who Should Use It?
  2. Key Features
    • Core Scraping Features
    • Advanced Features
    • Plugin System
  3. Getting Started
    • Prerequisites
    • Installation
    • Verifying Installation
  4. Basic Usage: Your First Scrape
    • Scraping a Simple Website
    • Extracting Links
    • Extracting Images
    • Extracting Meta Tags
    • Using the Command-Line Interface (CLI)
  5. Intermediate Usage: Leveling Up
    • Scraping Sitemaps
    • Scraping RSS Feeds
    • Parsing Assets (CSV, JSON, XML)
    • Checking HTTP Status Codes
  6. Advanced Usage: Power User Mode
    • Rate Limiting: Playing Nice with Servers
    • Queue System: Scraping Multiple URLs
    • API Integration: Combining Scraping with APIs
    • Custom CSS Selectors
  7. Plugins: Supercharging Your Scraper
    • What Are Plugins?
    • Available Plugins
    • How to Use Plugins
    • Learn More About Plugins
  8. Configuration: Customizing Your Scraper
    • Setting User Agent
    • Adjusting Timeout
    • Following Redirects
    • Using Constructor Configuration
  9. Testing: Ensuring Everything Works
    • Running Tests
    • Writing Your Own Tests
  10. Troubleshooting: Solving Common Problems
    • Installation Issues
    • Scraping Errors
    • Plugin Problems
  11. Contributing: Joining the Community
  12. License: Understanding Usage Rights
  13. Resources: Further Learning

What is Advance PHP Scraper?

Advance PHP Scraper is a PHP library that helps you extract data from websites, like a super-smart librarian who can quickly find and summarize books for you. Web scraping is like copying information from a webpage (e.g., product names, prices, or blog titles) using code instead of manually copying and pasting. This library makes it easy to navigate websites, grab specific data, and even handle tricky tasks like scraping JavaScript-heavy pages or processing thousands of URLs at once.

Imagine you’re at a giant library (the internet), and you need to collect all book titles from a specific shelf (a website). Doing this by hand would take forever, but Advance PHP Scraper is like a magical robot that does it for you in seconds. It’s designed to be:

Why Use This Library?

There are other scraping tools out there, but here’s why Advance PHP Scraper is special:

Who Should Use It?


Key Features

Let’s explore what Advance PHP Scraper can do. Think of these features as tools in a toolbox, each designed for a specific job.

Core Scraping Features

These are the basic tools you’ll use most often:

Advanced Features

These tools are for power users:

Plugin System

Plugins are like optional upgrades for your toolbox:


Getting Started

Let’s set up the library and run your first scrape. This section is like a cooking recipe: follow each step, and you’ll have a working scraper in no time.

Prerequisites

Before you start, you need:

Installation

Here’s how to install the library:

  1. Create a Project Folder: Make a new directory for your scraping project:

  2. Install Advance PHP Scraper: Run this Composer command to download the library and its dependencies:

    This creates a vendor/ folder with the library and dependencies like symfony/browser-kit and guzzlehttp/guzzle.

  3. Check the Files: After installation, you’ll see:
    • vendor/: Contains the library and dependencies.
    • composer.json: Lists the project’s dependencies.
    • composer.lock: Locks dependency versions.

Verifying Installation

Let’s make sure everything works. Create a file named test.php:

Run it:

Expected Output:

If you see this, you’re good to go! If you get an error, check the Troubleshooting section.


Basic Usage: Your First Scrape

Now, let’s scrape some real data! Think of this as your first adventure with the library, like learning to ride a bike with training wheels.

Scraping a Simple Website

Let’s scrape the title of a webpage. Create a file named scrape_title.php:

Run it:

Expected Output:

Line-by-Line Explanation:

What’s Happening Behind the Scenes?

Extracting Links

Let’s grab all the links on a page. Create scrape_links.php:

Run it:

Expected Output:

Line-by-Line Explanation:

Why This is Cool:

Extracting Images

Now, let’s grab images. Create scrape_images.php:

Run it:

Expected Output:

Explanation:

Why This is Useful:

Extracting Meta Tags

Meta tags contain SEO and social media data. Create scrape_meta.php:

Run it:

Expected Output:

Explanation:

Using the Command-Line Interface (CLI)

The CLI lets you scrape without writing PHP code. Run:

Expected Output (JSON):

Explanation:


Intermediate Usage: Leveling Up

Now that you’ve mastered the basics, let’s explore more features to make your scraper smarter.

Scraping Sitemaps

Sitemaps list all pages on a website, like a table of contents for a book. Create scrape_sitemap.php:

Run it:

Expected Output:

Explanation:

Why This is Awesome:

Scraping RSS Feeds

RSS feeds are like news tickers for websites. Create scrape_rss.php:

Run it:

Expected Output:

Explanation:

Why This is Handy:

Parsing Assets (CSV, JSON, XML)

You can parse files linked on pages. Create parse_asset.php:

Explanation:

Why This is Useful:

Checking HTTP Status Codes

Ensure a page loaded correctly with getStatusCode():

Expected Output:

Explanation:


Advanced Usage: Power User Mode

Ready to take your scraper to the next level? These features are like rocket boosters for your scraping adventures.

Rate Limiting: Playing Nice with Servers

Rate limiting prevents your scraper from overwhelming servers, which could lead to bans. Think of it as pacing yourself while eating cookies so you don’t get kicked out of the kitchen. Create rate_limit.php:

Run it:

Expected Output:

Explanation:

Tip:

Queue System: Scraping Multiple URLs

The queue system lets you scrape multiple URLs efficiently, like a conveyor belt processing orders. Create queue_scrape.php:

Run it:

Expected Output:

Line-by-Line Explanation:

Why This is Powerful:

API Integration: Combining Scraping with APIs

You can fetch data from APIs to complement your scraped data, like adding extra toppings to a pizza. Create api_scrape.php:

Run it:

Expected Output:

Explanation:

Use Case:

Custom CSS Selectors

Want to extract something specific, like a div with class content? Use filter():

Explanation:


Plugins: Supercharging Your Scraper

Plugins are like apps you install on your phone to add new features. They let you extend Advance PHP Scraper without changing its core code.

What Are Plugins?

A plugin is a PHP class that adds functionality, like rendering JavaScript pages or caching responses. Plugins live in src/Plugins/custom/ and are managed via plugins.json. You can enable/disable them or create your own.

Available Plugins

The library includes six plugins, each explained in detail in the PLUGIN_README.md. Here’s a quick overview:

How to Use Plugins

To use a plugin, enable it and call its methods. Example with CachePlugin:

For a complete guide on plugins, including how to enable, disable, or create them, check out the PLUGIN_README.md.


Configuration: Customizing Your Scraper

You can tweak the scraper’s settings to fit your needs, like adjusting a car’s mirrors before driving.

Setting User Agent

The user agent tells servers who’s scraping (like showing your ID at a library). Default is a bot-like string, but you can mimic a browser:

Adjusting Timeout

Set how long the scraper waits for a response:

Following Redirects

Choose whether to follow HTTP redirects:

Using Constructor Configuration

Pass settings when creating the scraper:

Explanation:


Testing: Ensuring Everything Works

The library comes with tests to make sure it works perfectly. Think of tests as a quality check, like tasting a cake before serving it.

Running Tests

Install development dependencies:

Run tests:

Expected Output:

Writing Your Own Tests

Add tests in tests/. Example for a custom method:


Troubleshooting: Solving Common Problems

Even the best tools can hit snags. Here’s how to fix common issues:

Installation Issues

Scraping Errors

Plugin Problems


Contributing: Joining the Community

Love the library? Help make it better! Contribute by fixing bugs, adding features, or improving docs. Read the CONTRIBUTING.md for a detailed guide.


License: Understanding Usage Rights

Advance PHP Scraper is licensed under the MIT License, meaning you can use, modify, and share it freely. See the LICENSE file for details.


Resources: Further Learning



All versions of advance-phpscraper with dependencies

PHP Build Version
Package Version
Requires php Version ^8.0
symfony/browser-kit Version ^5.4
symfony/dom-crawler Version ^5.4
symfony/css-selector Version ^5.4
guzzlehttp/guzzle Version ^7.0
symfony/event-dispatcher Version ^5.4
symfony/console Version ^5.4
symfony/mime Version ^5.4
monolog/monolog Version ^2.0
league/uri Version ^6.5
donatello-za/rake-php-plus Version ^1.0.3
intervention/image Version ^2.7
ext-dom Version *
ext-libxml Version *
ext-gd Version *
ext-simplexml Version *
ext-mbstring Version *
ext-curl Version *
ext-fileinfo Version *
ext-xml Version *
ext-zlib Version *
ext-json Version *
ext-iconv Version *
ext-pcre Version *
ext-ctype Version *
ext-xmlwriter Version *
ext-tokenizer Version *
ext-filter Version *
ext-xmlreader Version *
ext-sockets Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package rajpurohithitesh/advance-phpscraper contains the following files

Loading the files please wait ....