Download the PHP package wa72/htmlpagedom without Composer
On this page you can find all versions of the php package wa72/htmlpagedom. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package htmlpagedom
HtmlPageDom
Wa72\HtmlPageDom
is a PHP library for easy manipulation of HTML documents using DOM.
It requires DomCrawler from Symfony components for traversing
the DOM tree and extends it by adding methods for manipulating the DOM tree of HTML documents.
It's useful when you need to not just extract information from an HTML file (what DomCrawler does) but
also to modify HTML pages. It is usable as a template engine: load your HTML template file, set new
HTML content on certain elements such as the page title, div#content
or ul#menu
and print out
the modified page.
Wa72\HtmlPageDom
consists of two main classes:
-
HtmlPageCrawler
extendsSymfony\Components\DomCrawler
by adding jQuery inspired, HTML specific DOM manipulation functions such assetInnerHtml($htmltext)
,before()
,append()
,wrap()
,addClass()
orcss()
. It's like jQuery for PHP: simply select elements of an HTML page using CSS selectors and change their attributes and content.API doc for HtmlPageCrawler
-
HtmlPage
represents one complete HTML document and offers convenience functions likegetTitle()
,setTitle($title)
,setMeta('description', $description)
,getBody()
. Internally, it uses theHtmlPageCrawler
class for filtering and manipulating DOM Elements. Since version 1.2, it offers methods for compressing (minify()
) and prettyprinting (indent()
) the HTML page.API doc for HtmlPage
Requirements and Compatibility
Version 3.x:
- PHP 8.x
- Symfony\Components\DomCrawler 6.x | 7.x
- Symfony\Components\CssSelector 6.x | 7.x
Version 2.x:
- PHP ^7.4 | 8.x
- Symfony\Components\DomCrawler ^4.4 | 5.x
- Symfony\Components\CssSelector ^4.4 | 5.x
There is no difference in our API between versions 2.x and 3.0.x. The only difference is the compatibility with different versions of Symfony.
Installation
-
using composer:
composer require wa72/htmlpagedom
- using other PSR-4 compliant autoloader: clone this project to where your included libraries are and point your autoloader to look for the "\Wa72\HtmlPageDom" namespace in the "src" directory of this project
Usage
HtmlPageCrawler
is a wrapper around DOMNodes. HtmlPageCrawler
objects can be created using new
or the static function
HtmlPageCrawler::create()
, which accepts an HTML string or a DOMNode (or an array of DOMNodes, a DOMNodeList, or even
another Crawler
object) as arguments.
Afterwards you can select nodes from the added DOM tree by calling filter()
(equivalent to find() in jQuery) and alter
the selected elements using the following jQuery-like manipulation functions:
addClass()
,hasClass()
,removeClass()
,toggleClass()
after()
,before()
append()
,appendTo()
makeClone()
(equivalent toclone()
in jQuery)css()
(aliasgetStyle()
/setStyle()
)html()
(get inner HTML content) andsetInnerHtml($html)
attr()
(aliasgetAttribute()
/setAttribute()
),removeAttr()
insertAfter()
,insertBefore()
makeEmpty()
(equivalent toempty()
in jQuery)prepend()
,prependTo()
remove()
replaceAll()
,replaceWith()
text()
,getCombinedText()
(get text content of all nodes in the Crawler), andsetText($text)
wrap()
,unwrap()
,wrapInner()
,unwrapInner()
,wrapAll()
To get the modified DOM as HTML code use html()
(returns innerHTML of the first node in your crawler object)
or saveHTML()
(returns combined "outer" HTML code of all elements in the list).
See the full methods documentation in the generated API doc for HtmlPageCrawler
Example:
Advanced example: remove the third column from an HTML table
Usage examples for the HtmlPage
class:
See also the generated API doc for HtmlPage
Limitations
-
HtmlPageDom builds on top of PHP's DOM functions and uses the loadHTML() and saveHTML() methods of the DOMDocument class. That's why it's output is always HTML, not XHTML.
-
The HTML parser used by PHP is built for HTML4. It throws errors on HTML5 specific elements which are ignored by HtmlPageDom, so HtmlPageDom is usable for HTML5 with some limitations.
- HtmlPageDom has not been tested with character encodings other than UTF-8.
History
When I discovered how easy it was to modify HTML documents using jQuery I looked for a PHP library providing similar possibilities for PHP.
Googling around I found SimpleHtmlDom and later Ganon but both turned out to be very slow. Nevertheless I used both libraries in my projects.
When Symfony2 appeared with it's DomCrawler and CssSelector components I thought: the functions for traversing the DOM tree and selecting elements by CSS selectors are already there, only the manipulation functions are missing. Let's implement them! So the HtmlPageDom project was born.
It turned out that it was a good choice to build on PHP's DOM functions: Compared to SimpleHtmlDom and Ganon, HmtlPageDom is lightning fast. In one of my projects, I have a PHP script that takes a huge HTML page containing several hundreds of article elements and extracts them into individual HTML files (that are later on demand loaded by AJAX back into the original HTML page). Using SimpleHtmlDom it took the script 3 minutes (right, minutes!) to run (and I needed to raise PHP's memory limit to over 500MB). Using Ganon as HTML parsing and manipulation engine it took even longer, about 5 minutes. After switching to HtmlPageDom the same script doing the same processing tasks is running only about one second (all on the same server). HtmlPageDom is really fast.
© 2012-2023 Christoph Singer. Licensed under the MIT License.
All versions of htmlpagedom with dependencies
ext-dom Version *
ext-libxml Version *
ext-mbstring Version *
symfony/dom-crawler Version ^4.4|^5
symfony/css-selector Version ^4.4|^5