PHP download

Download the PHP package webcrawlerapi/sdk without Composer

On this page you can find all versions of the php package webcrawlerapi/sdk. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

Table of contents
Download webcrawlerapi/sdk
More information about webcrawlerapi/sdk
Files in webcrawlerapi/sdk

Vendor webcrawlerapi
Package sdk
Short Description A PHP SDK for WebCrawler API - turn website into data
License MIT
Homepage https://github.com/webcrawlerapi/webcrawlerapi-php-sdk

Keywords data api sdk scraper crawler website webcrawler llm rag webcrawlerapi

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:

If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.

Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
To use Composer is sometimes complicated. Especially for beginners.
Composer needs much resources. Sometimes they are not available on a simple webspace.
If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.

Please rate this library. Is it a good library?

Example code of webcrawlerapi/sdk

Informations about the package sdk

WebCrawler API PHP SDK

A PHP SDK for interacting with the WebCrawlerAPI - a powerful web crawling and scraping service.

In order to use the API you have to get an API key from WebCrawlerAPI

Read documentation at WebCrawlerAPI Docs for more information.

Requirements

PHP 8.0 or higher
Composer
ext-json PHP extension
Guzzle HTTP Client 7.0 or higher

Installation

You can install the package via composer:

Usage

API Methods

crawl()

Starts a new crawling job and waits for its completion. This method will continuously poll the job status until:

The job reaches a terminal state (done, error, or cancelled)
The maximum number of polls is reached (default: 100)
The polling interval is determined by the server's recommendedPullDelayMs or defaults to 5 seconds

crawlAsync()

Starts a new crawling job and returns immediately with a job ID. Use this when you want to handle polling and status checks yourself, or when using webhooks.

getJob()

Retrieves the current status and details of a specific job.

cancelJob()

Cancels a running job. Any items that are not in progress or already completed will be marked as canceled and will not be charged.

Parameters

Crawl Methods (crawl and crawlAsync)

url (required): The seed URL where the crawler starts. Can be any valid URL.
scrapeType (default: "html"): The type of scraping you want to perform. Can be "html", "cleaned", or "markdown".
itemsLimit (default: 10): Crawler will stop when it reaches this limit of pages for this job.
webhookUrl (optional): The URL where the server will send a POST request once the task is completed.
allowSubdomains (default: false): If true, the crawler will also crawl subdomains.
whitelistRegexp (optional): A regular expression to whitelist URLs. Only URLs that match the pattern will be crawled.
blacklistRegexp (optional): A regular expression to blacklist URLs. URLs that match the pattern will be skipped.
maxPolls (optional, crawl only): Maximum number of status checks before returning (default: 100)

Responses

CrawlAsync Response

The crawlAsync() method returns a CrawlResponse object with:

id: The unique identifier of the created job

Job Response

The Job object contains detailed information about the crawling job:

id: The unique identifier of the job
orgId: Your organization identifier
url: The seed URL where the crawler started
status: The status of the job (new, in_progress, done, error)
scrapeType: The type of scraping performed
createdAt: The date when the job was created
finishedAt: The date when the job was finished (if completed)
webhookUrl: The webhook URL for notifications
webhookStatus: The status of the webhook request
webhookError: Any error message if the webhook request failed
jobItems: Array of JobItem objects representing crawled pages
recommendedPullDelayMs: Server-recommended delay between status checks

JobItem Properties

Each JobItem object represents a crawled page and contains:

id: The unique identifier of the item
jobId: The parent job identifier
job: Reference to the parent Job object
originalUrl: The URL of the page
pageStatusCode: The HTTP status code of the page request
status: The status of the item (new, in_progress, done, error)
title: The page title
createdAt: The date when the item was created
cost: The cost of the item in $
referredUrl: The URL where the page was referred from
lastError: Any error message if the item failed
errorCode: The error code if the item failed (if available)
getContent(): Method to get the page content based on the job's scrapeType (html, cleaned, or markdown). Returns null if the item's status is not "done" or if content is not available. Content is automatically fetched and cached when accessed.
rawContentUrl: URL to the raw content (if available)
cleanedContentUrl: URL to the cleaned content (if scrapeType is "cleaned")
markdownContentUrl: URL to the markdown content (if scrapeType is "markdown")

License

MIT License

All versions of sdk with dependencies

PHP Build Version

Package Version

Version 1.0.3 Release 11. Apr 2025
create-project require 0 people chose require and
0 people chose create-project.

Download

Download latest version of sdk from vendor webcrawlerapi

Requires php Version >=8.0
guzzlehttp/guzzle Version ^7.0
ext-json Version *

Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package webcrawlerapi/sdk contains the following files

Loading the files please wait ....