Download the PHP package mensbeam/html-parser without Composer

On this page you can find all versions of the php package mensbeam/html-parser. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package html-parser

HTML-Parser

A modern, accurate HTML parser and serializer for PHP.

Usage

Parsing documents

The MensBeam\HTML\Parser::parse static method is used to parse documents. An arbitrary string and optional encoding are taken as input, and a MensBeam\HTML\Parser\Output object is returned as output. The Output object has the following properties:

Extra configuration parameters may be given to the parser by passing a MensBeam\HTML\Parser\Config object as the final $config argument. See the Configuration section below for more details.

Parsing with DOMParser

Since version 1.3.0, the library also provides an implemention of the DOMParser interface.

Like the standard interface, it will parse either HTML or XML documents. This implementation does, however, differ in the following ways:

Parsing into existing documents

The MensBeam\HTML\Parser::parseInto static method is used to parse into an existing document. The supplied document must be an instance of (or derived from) \DOMDocument and also must be empty. All other arguments are identical to those used when parsing documents normally.

NOTE: The documentClass configuration option has no effect when using this method.

Parsing fragments

The MensBeam\HTML\Parser::parseFragment static method is used to parse document fragments. The primary use case for this method is in the implementation of the innerHTML setter of HTML elements. Consequently a context element is required, as well as the "quirks mode" property of the context element's document (which must be one of Parser::NO_QURIKS_MODE (0), Parser::QUIRKS_MODE (1), or Parser::LIMITED_QUIRKS_MODE (2)). The further arguments are identical to those used when parsing documents.

If the "quirks mode" property of the document is not known, using Parser::NO_QUIRKS_MODE (0) is usually the best choice.

Unlike the parse() method, the parseFragment() method returns a DOMDocumentFragment object belonging to $contextElement's owner document.

Serializing nodes

The MensBeam\HTML\Parser::serialize method can be used to convert most DOMNode objects into strings, using the basic algorithm defined in the HTML specification. Nodes of the following types can be successfully serialized:

Similarly, the MensBeam\HTML\Parser::serializeInner method can be used to convert the children of non-leaf DOMNode objects into strings, using the basic algorithm defined in the HTML specification. Children of nodes of the following types can be successfully serialized:

The serialization methods use an associative array for configuration, and the possible keys and value types are:

Examples

Configuration

The MensBeam\HTML\Parser\Config class is used as a container for configuration parameters for the parser. We have tried to use rational defaults, but some parameters are nevertheless configurable:

Limitations

The primary aim of this library is accuracy. If the document object differs from what the specification mandates, this is probably a bug. However, we are also constrained by PHP, which imposes various limtations. These are as follows:

Comparison with masterminds/html5

This library and masterminds/html5 serve similar purposes. Generally, we are more accurate, but they are much faster. The following table summarizes the main functional differences.

DOMDocument Masterminds MensBeam
Minimum PHP version 5.0 5.3 7.1
Extensions required dom dom, ctype, mbstring or iconv dom
Target HTML version HTML 4.01 HTML 5.0 WHATWG Living Standard
Supported encodings System-dependent System-dependent Per specification
Encoding detection BOM, http-equiv None Per specification (Steps 1-5 & 9)
Fallback encoding ISO 8859-1 UTF-8, configurable Windows-1252, configurable
Handling of invalid characters Bytes are passed through Characters are dropped Per specification
Handling of invalid XML element names Variable Name is changed to "invalid" Per specification
Handling of invalid XML attribute names Variable Attribute is dropped Per specification
Handling of misnested tags Parent end tags always close children Parent end tags always close children Per specification
Handling of data between table cells Left as-is Left as-is Per specification
Handling of omitted start tags Elements are not inserted Elements are not inserted Per specification
Handling of processing instructions Retained Retained Per specification, configurable
Handling of bogus XLink namespace* Foreign content not supported XLink attributes are lost if preceded by bogus namespace Bogus namespace is ignored
Namespace for HTML elements Null Per specification, configurable Null, configurable
Time needed to parse single-page HTML specification 0.5 seconds 2.7 seconds† 6.0 seconds
Peak memory needed for same 11.6 MB 38 MB 13.9 MB

* For example: <svg xmlns:xlink='http://www.w3.org/1999/xhtml' xlink:href='http://example.com/'/>. It is unclear what correct behaviour is, but we believe our behaviour to be more consistent with the intent of the specification.

† With HTML namespace disabled. With HTML namespace enabled it does not finish in a reasonable time due to a PHP bug.


All versions of html-parser with dependencies

PHP Build Version
Package Version
Requires php Version >=7.1
ext-dom Version *
mensbeam/intl Version >=0.9.1
mensbeam/mimesniff Version >=0.2.0
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package mensbeam/html-parser contains the following files

Loading the files please wait ....