Libraries tagged by Web Extractor

jkphl/micrometa

119 Favers
149334 Downloads

A meta parser for extracting micro information out of web documents, currently supporting Microformats 1+2, HTML Microdata, RDFa Lite 1.1 and JSON-LD

Go to Download


crawlzone/crawlzone

78 Favers
5129 Downloads

Crawlzone is a fast asynchronous internet crawling framework aiming to provide open source web search and testing solution. It can be used for a wide range of purposes, from extracting and indexing structured data to monitoring and automated testing.

Go to Download


aspose/pdf-sdk-php

9 Favers
19701 Downloads

Aspose.PDF Cloud is a REST API for creating and editing PDF files. It can also be used to convert PDF files to different formats like DOC, HTML, XPS, TIFF and many more. Aspose.PDF Cloud gives you control: create PDFs from scratch or from HTML, XML, template, database, XPS or an image. Render PDFs to image formats such as JPEG, PNG, GIF, BMP, TIFF and many others. Aspose.PDF Cloud helps you manipulate elements of a PDF file like text, annotations, watermarks, signatures, bookmarks, stamps and so on. Its REST API also allows you to manage PDF pages by using features like merging, splitting, and inserting. Add images to a PDF file or convert PDF pages to images.

Go to Download


fg/essence

734 Favers
53565 Downloads

Extracts information about medias on the web, like youtube videos, twitter statuses or blog articles.

Go to Download


pdfgeneratorapi/php-client

5 Favers
191930 Downloads

# Introduction [PDF Generator API](https://pdfgeneratorapi.com) allows you easily generate transactional PDF documents and reduce the development and support costs by enabling your users to create and manage their document templates using a browser-based drag-and-drop document editor. The PDF Generator API features a web API architecture, allowing you to code in the language of your choice. This API supports the JSON media type, and uses UTF-8 character encoding. ## Base URL The base URL for all the API endpoints is `https://us1.pdfgeneratorapi.com/api/v4` For example * `https://us1.pdfgeneratorapi.com/api/v4/templates` * `https://us1.pdfgeneratorapi.com/api/v4/workspaces` * `https://us1.pdfgeneratorapi.com/api/v4/templates/123123` ## Editor PDF Generator API comes with a powerful drag & drop editor that allows to create any kind of document templates, from barcode labels to invoices, quotes and reports. You can find tutorials and videos from our [Support Portal](https://support.pdfgeneratorapi.com). * [Component specification](https://support.pdfgeneratorapi.com/en/category/components-1ffseaj/) * [Expression Language documentation](https://support.pdfgeneratorapi.com/en/category/expression-language-q203pa/) * [Frequently asked questions and answers](https://support.pdfgeneratorapi.com/en/category/qanda-1ov519d/) ## Definitions ### Organization Organization is a group of workspaces owned by your account. ### Workspace Workspace contains templates. Each workspace has access to their own templates and organization default templates. ### Master Workspace Master Workspace is the main/default workspace of your Organization. The Master Workspace identifier is the email you signed up with. ### Default Template Default template is a template that is available for all workspaces by default. You can set the template access type under Page Setup. If template has "Organization" access then your users can use them from the "New" menu in the Editor. ### Data Field Data Field is a placeholder for the specific data in your JSON data set. In this example JSON you can access the buyer name using Data Field `{paymentDetails::buyerName}`. The separator between depth levels is :: (two colons). When designing the template you don’t have to know every Data Field, our editor automatically extracts all the available fields from your data set and provides an easy way to insert them into the template. ``` { "documentNumber": 1, "paymentDetails": { "method": "Credit Card", "buyerName": "John Smith" }, "items": [ { "id": 1, "name": "Item one" } ] } ``` ## Rate limiting Our API endpoints use IP-based rate limiting and allow you to make up to 2 requests per second and 60 requests per minute. If you make more requests, you will receive a response with HTTP code 429. Response headers contain additional values: | Header | Description | |--------|--------------------------------| | X-RateLimit-Limit | Maximum requests per minute | | X-RateLimit-Remaining | The requests remaining in the current minute | | Retry-After | How many seconds you need to wait until you are allowed to make requests | * * * * * # Libraries and SDKs ## Postman Collection We have created a [Postman Collection](https://www.postman.com/pdfgeneratorapi/workspace/pdf-generator-api-public-workspace/overview) so you can easily test all the API endpoints without developing and code. You can download the collection [here](https://www.postman.com/pdfgeneratorapi/workspace/pdf-generator-api-public-workspace/collection/11578263-42fed446-af7e-4266-84e1-69e8c1752e93). ## Client Libraries All our Client Libraries are auto-generated using [OpenAPI Generator](https://openapi-generator.tech/) which uses the OpenAPI v3 specification to automatically generate a client library in specific programming language. * [PHP Client](https://github.com/pdfgeneratorapi/php-client) * [Java Client](https://github.com/pdfgeneratorapi/java-client) * [Ruby Client](https://github.com/pdfgeneratorapi/ruby-client) * [Python Client](https://github.com/pdfgeneratorapi/python-client) * [Javascript Client](https://github.com/pdfgeneratorapi/javascript-client) We have validated the generated libraries, but let us know if you find any anomalies in the client code. * * * * * # Authentication The PDF Generator API uses __JSON Web Tokens (JWT)__ to authenticate all API requests. These tokens offer a method to establish secure server-to-server authentication by transferring a compact JSON object with a signed payload of your account’s API Key and Secret. When authenticating to the PDF Generator API, a JWT should be generated uniquely by a __server-side application__ and included as a __Bearer Token__ in the header of each request. ## Accessing your API Key and Secret You can find your __API Key__ and __API Secret__ from the __Account Settings__ page after you login to PDF Generator API [here](https://pdfgeneratorapi.com/login). ## Creating a JWT JSON Web Tokens are composed of three sections: a header, a payload (containing a claim set), and a signature. The header and payload are JSON objects, which are serialized to UTF-8 bytes, then encoded using base64url encoding. The JWT's header, payload, and signature are concatenated with periods (.). As a result, a JWT typically takes the following form: ``` {Base64url encoded header}.{Base64url encoded payload}.{Base64url encoded signature} ``` We recommend and support libraries provided on [jwt.io](https://jwt.io/). While other libraries can create JWT, these recommended libraries are the most robust. ### Header Property `alg` defines which signing algorithm is being used. PDF Generator API users HS256. Property `typ` defines the type of token and it is always JWT. ``` { "alg": "HS256", "typ": "JWT" } ``` ### Payload The second part of the token is the payload, which contains the claims or the pieces of information being passed about the user and any metadata required. It is mandatory to specify the following claims: * issuer (`iss`): Your API key * subject (`sub`): Workspace identifier * expiration time (`exp`): Timestamp (unix epoch time) until the token is valid. It is highly recommended to set the exp timestamp for a short period, i.e. a matter of seconds. This way, if a token is intercepted or shared, the token will only be valid for a short period of time. ``` { "iss": "ad54aaff89ffdfeff178bb8a8f359b29fcb20edb56250b9f584aa2cb0162ed4a", "sub": "[email protected]", "exp": 1586112639 } ``` ### Signature To create the signature part you have to take the encoded header, the encoded payload, a secret, the algorithm specified in the header, and sign that. The signature is used to verify the message wasn't changed along the way, and, in the case of tokens signed with a private key, it can also verify that the sender of the JWT is who it says it is. ``` HMACSHA256( base64UrlEncode(header) + "." + base64UrlEncode(payload), API_SECRET) ``` ### Putting all together The output is three Base64-URL strings separated by dots. The following shows a JWT that has the previous header and payload encoded, and it is signed with a secret. ``` eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJhZDU0YWFmZjg5ZmZkZmVmZjE3OGJiOGE4ZjM1OWIyOWZjYjIwZWRiNTYyNTBiOWY1ODRhYTJjYjAxNjJlZDRhIiwic3ViIjoiZGVtby5leGFtcGxlQGFjdHVhbHJlcG9ydHMuY29tIn0.SxO-H7UYYYsclS8RGWO1qf0z1cB1m73wF9FLl9RCc1Q // Base64 encoded header: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9 // Base64 encoded payload: eyJpc3MiOiJhZDU0YWFmZjg5ZmZkZmVmZjE3OGJiOGE4ZjM1OWIyOWZjYjIwZWRiNTYyNTBiOWY1ODRhYTJjYjAxNjJlZDRhIiwic3ViIjoiZGVtby5leGFtcGxlQGFjdHVhbHJlcG9ydHMuY29tIn0 // Signature: SxO-H7UYYYsclS8RGWO1qf0z1cB1m73wF9FLl9RCc1Q ``` ## Temporary JWTs You can create a temporary token in [Account Settings](https://pdfgeneratorapi.com/account/organization) page after you login to PDF Generator API. The generated token uses your email address as the subject (`sub`) value and is valid for __15 minutes__. You can also use [jwt.io](https://jwt.io/) to generate test tokens for your API calls. These test tokens should never be used in production applications. * * * * * # Error codes | Code | Description | |--------|--------------------------------| | 401 | Unauthorized | | 402 | Payment Required | | 403 | Forbidden | | 404 | Not Found | | 422 | Unprocessable Entity | | 429 | Too Many Requests | | 500 | Internal Server Error | ## 401 Unauthorized | Description | |-------------------------------------------------------------------------| | Authentication failed: request expired | | Authentication failed: workspace missing | | Authentication failed: key missing | | Authentication failed: property 'iss' (issuer) missing in JWT | | Authentication failed: property 'sub' (subject) missing in JWT | | Authentication failed: property 'exp' (expiration time) missing in JWT | | Authentication failed: incorrect signature | ## 402 Payment Required | Description | |-------------------------------------------------------------------------| | Your account is suspended, please upgrade your account | ## 403 Forbidden | Description | |-------------------------------------------------------------------------| | Your account has exceeded the monthly document generation limit. | | Access not granted: You cannot delete master workspace via API | | Access not granted: Template is not accessible by this organization | | Your session has expired, please close and reopen the editor. | ## 404 Entity not found | Description | |-------------------------------------------------------------------------| | Entity not found | | Resource not found | | None of the templates is available for the workspace. | ## 422 Unprocessable Entity | Description | |-------------------------------------------------------------------------| | Unable to parse JSON, please check formatting | | Required parameter missing | | Required parameter missing: template definition not defined | | Required parameter missing: template not defined | ## 429 Too Many Requests | Description | |-------------------------------------------------------------------------| | You can make up to 2 requests per second and 60 requests per minute. | * * * * *

Go to Download


g4/aura-router

0 Favers
7137 Downloads

The Aura Router package implements web routing; given a URI path and a copy of $_SERVER, it will extract controller, action, and parameter values for a specific route.

Go to Download


yarri/essence

0 Favers
1320 Downloads

Extracts information about medias on the web, like youtube videos, twitter statuses or blog articles.

Go to Download


the-3labs-team/laravel-readability

17 Favers
90 Downloads

Laravel Readability is a supercharged PHP client that makes it easy to extract and manipulate the main content of a web page.

Go to Download


piedweb/text-analyzer

0 Favers
21087 Downloads

Semantic Analysis : Extract Expressions from a text and order it by density.

Go to Download


instaclick/translation-editor-bundle

20 Favers
19506 Downloads

Web editor UI to manage Symfony2 translations (Symfony2 bundle)

Go to Download


tacman/graby

0 Favers
41 Downloads

Graby helps you extract article content from web pages, fork of j0k3r/graby

Go to Download


yoozi/miner

15 Favers
76 Downloads

PHP library to extract the metadata from a public web page and/or summarize it.

Go to Download


diglin/oro-deepl

2 Favers
643 Downloads

Allow to extract non translated strings from OroPlatform applications and translate them through deepl web service. A Deepl developer account is required

Go to Download


sters/extract-content

4 Favers
887 Downloads

Extract web articles

Go to Download


elozoya/media-from-web

0 Favers
22 Downloads

Find media from web urls

Go to Download


<< Previous Next >>