Download the PHP package voilab/htmlcleaner without Composer
On this page you can find all versions of the php package voilab/htmlcleaner. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download voilab/htmlcleaner
More information about voilab/htmlcleaner
Files in voilab/htmlcleaner
Package htmlcleaner
Short Description A HTML cleaner based on SimpleXML, fast and customizable
License MIT
Homepage http://www.voilab.ch
Informations about the package htmlcleaner
Voilab HTML cleaner
A HTML cleaner based on SimpleXML, fast and customizable
Install
Via Composer
Create a composer.json file in your project root:
Sample dataset
Basic usage
All tags stripped
Allow some tags
Allow some tags and attributes (regardless of tags)
Allow some attributes only on certain tags
Advanced usage
Processors
Processors are used to prepare HTML string before it is inserted into a new SimpleXMLElement (base of the process). They are also used to format the HTML after it is cleaned. It's some sort of pre-process and post-process.
The pre-process must remove not allowed tags.
Standard processor
The standard processor uses strip_tags()
to remove not allowed tags. After
process, the processor removes all carriage returns from the string.
Custom processor
You can create your own processor by implementing
\voilab\cleaner\processor\Processor
. Do not forget that the pre-process
is responsible of removing all not allowed tags.
Attributes
Attributes classes are used to validate attributes and their content. By default
an allowed attribute becomes a \voilab\cleaner\attribute\Keep
. Every
"not allowed" attribute becomes a \voilab\cleaner\attribute\Remove
.
These two attribute types don't need to be instanciated by you. All attributes
provided as a string in setAllowedTags()
are converted in Keep
class.
Js attribute
You may want to keep some attributes but check the content. It's true for the
href
attribute. It can contain a valid URL or some javascript injection.
There is an attribute validator already created for that:
Note that allowed attributes can be bound or not to a specific tag. In the example above, the href attribute will be valid for every HTML tag. If you want to bind the attribute to a tag, you need to specify it as a second parameter.
Known limitations
Root mixed content
Mixed content outside tags is not allowed in root position.
Bad HTML format with Standard processor
If HTML is not well formatted, the cleaner will throw an \Exception
. The
string needs to be perfectly written, because it is processed by
simplexml_load_string($html)
, which is very strict:
- tags must be closed (
<p></p>
or<br />
) - attributes must be wrapped in (double-)quotes (
<hr class="test" />
) - (double-)quote is not allowed in attribute content, it must be converted in
"
beforeHtmlCleaner::clean()
is called - opening tag
<
and&
are not allowed in content, they must be converted respectivly in<
and&
beforeHtmlCleaner::clean()
is called
These limitations will eventually be addressed in future releases.
Testing
Security
If you discover any security related issues, please use the issue tracker.
Credits
License
The MIT License (MIT). Please see License File for more information.