Download the PHP package soundasleep/html5lib-php without Composer
On this page you can find all versions of the php package soundasleep/html5lib-php. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download soundasleep/html5lib-php
More information about soundasleep/html5lib-php
Files in soundasleep/html5lib-php
Package html5lib-php
Short Description A PHP implementations of a HTML parser based on the WHATWG HTML5 specification.
License MIT
Homepage https://github.com/soundasleep/html5lib-php
Informations about the package html5lib-php
HTML5Lib - PHP flavour
This is an implementation of the tokenization and tree-building parts of the HTML5 specification in PHP. Potential uses of this library can be found in web-scrapers and HTML filters.
Warning: This is a pre-alpha release, and as such, certain parts of this code are not up-to-snuff (e.g. error reporting and performance). However, the code is very close to spec and passes 100% of tests not related to parse errors. Nevertheless, expect to have to update your code on the next upgrade.
This fork combines the work of html5lib/html5lib-php and lavoiesl/php-html5lib, and can be used with composer through Packagist:
Usage notes
Documentation
Developer notes
-
To setup unit tests, you need to add a small stub file test-settings.php that contains $simpletest_location = 'path/to/simpletest/'; This needs to be version 1.1 (or, until that is released, SVN trunk) of SimpleTest.
-
We don't want to ultimately use PHP's DOM because it is not tolerant of certain types of errors that HTML 5 allows (for example, an element "foo@bar"). But the current implementation uses it, since it's easy. Eventually, this html5lib implementation will get a version of SimpleTree; and may possibly start using that by default.
- The original implementation of this performed line and column tracking in place. However, it was found that this approximately doubled the runtime of tokenization, so we decided to take a more optimistic approach: only calculate line/column numbers when explicitly asked to. This is slower if we attempt to calculate line/column numbers for everything in the document, but if there is a small enough number of errors it is a great improvement.