Download the PHP package kevintweber/html-tokenizer without Composer
On this page you can find all versions of the php package kevintweber/html-tokenizer. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download kevintweber/html-tokenizer
More information about kevintweber/html-tokenizer
Files in kevintweber/html-tokenizer
Package html-tokenizer
Short Description Will tokenize HTML.
License MIT
Homepage https://github.com/kevintweber/HtmlTokenizer
Informations about the package html-tokenizer
Html Tokenizer
This package will tokenize HTML input.
Some uses of HTML tokens:
- Tidy/Minify HTML output
- Preprocess HTML
- Filter HTML
- Sanitize HTML
Install
Via Composer
Usage
The following simple HTML:
will produce the following array:
Tokens
The tokens are of the following types:
Name | Example |
---|---|
cdata |
\<![CDATA[ Character data goes in here. ]]> |
comment |
\ |
doctype |
\<!DOCTYPE html> |
element |
\ |
php |
\ |
text |
Most of your content will be text. |
Special parsing situations
- Contents of an "iframe" element are not parsed.
- Contents of a "script" element are considered TEXT.
- Contents of a "style" element are considered TEXT.
Limitations
Currently, this package will tokenize HTML5 and XHTML.
It tries to handle errors according to the standard. The tokenizer can handle some (but not all) malformed HTML. You can set the tokenizer to fail silently or throw an exception when it encounters an error. (The default setting is to throw an exception.)
If you come across valid HTML this package cannot parse, please submit an issue.
Change log
Please see CHANGELOG for more information what has changed recently.
Testing
Contributing
Please see CONTRIBUTING for details.
Security
If you discover any security related issues, please email [email protected] instead of using the issue tracker.
Credits
- Kevin Weber
- All Contributors
License
The MIT License (MIT). Please see License File for more information.