Download the PHP package mensbeam/html-dom without Composer

On this page you can find all versions of the php package mensbeam/html-dom. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package html-dom

HTML DOM

Modern DOM library written in PHP for HTML documents. This library is an attempt to implement the WHATWG's DOM specification and WHATWG HTML DOM extensions specification through a userland extension and encapsulation of PHP's built-in DOM. It exists because PHP's DOM is inaccurate, inadequate for use with any HTML, and extremely buggy. This implementation aims to fix as much as possible the inaccuracies of the PHP DOM, add in features necessary for modern HTML development, and circumvent most of the bugs.

Requirements

Usage

Full documentation for most of the library shouldn't be necessary because it largely follows the specification, but because of how the library is to be used there are a few things that are glaringly different. These will be outlined below.

MensBeam\HTML\DOM\Document

MensBeam\HTML\DOM\Document implements \ArrayAccess, allowing the class to access named properties via array syntax:

Output:

There are limitations as to what is considered a named property. Refer to the WHATWG HTML DOM extensions specification for more details as to what is allowed to be accessed this way.

MensBeam\HTML\DOM\Document::__construct

Creates a new MensBeam\HTML\DOM\Document object.

Examples

MensBeam\HTML\DOM\Document::destroy

Destroys references associated with the instance so it may be garbage collected by PHP. Because of the way PHP's garbage collection is and the poor state of the library PHP DOM is based off of, references must be kept in userland for every created document. Therefore, this method should unfortunately be manually called whenever the document is not needed anymore.

Example

MensBeam\HTML\DOM\Document::registerXPathFunctions

Register PHP functions as XPath functions. Works like \DOMXPath::registerPhpFunctions except that the php namespace does not need to be registered.

Example

Output:

MensBeam\HTML\DOM\Document::serialize

Converts a node to a string.

Examples

MensBeam\HTML\DOM\Document::serializeInner

Converts a node to a string but only serializes the node's contents.

Examples

MensBeam\HTML\DOM\Node

Common namespace constants are provided in MensBeam\HTML\DOM\Node to make using namespaces with this library not so onerous. In addition, constants are provided here to be used with MensBeam\HTML\DOM\ParentNode::walk. MensBeam\HTML\DOM\Node also implements \Stringable which means that any node can be simply converted to a string to serialize it.

Properties

innerNode: A readonly property that returns the encapsulated inner element.

WARNING: Manipulating this node directly can result in unexpected behavior. This is available in the public API only so the class may be interfaced with other libraries which expect a \DOMDocument object such as MensBeam\Lit.

MensBeam\HTML\DOM\Node::getNodePath

Carryover from PHP's DOM. It's a useful method that returns an XPath location path for the node. Returns a string if successful or null on failure.

MensBeam\HTML\DOM\ParentNode

MensBeam\HTML\DOM\ParentNode::walk

Applies the callback filter while walking down the DOM tree and yields nodes matching the filter in a generator.

Example

Output:

MensBeam\HTML\DOM\XPathEvaluator

MensBeam\HTML\DOM\XPathEvaluator::registerXPathFunctions

Register PHP functions as XPath functions. Works like \DOMXPath::registerPhpFunctions except that the php namespace does not need to be registered.

Example

Output:

MensBeam\HTML\DOM\XPathResult

MensBeam\HTML\DOM\XPathResult implements \ArrayAccess, \Countable, and \Iterator and will allow for accessing as if it is an array when the result type is MensBeam\HTML\DOM\XPathResult::ORDERED_NODE_ITERATOR_TYPE, MensBeam\HTML\DOM\XPathResult::UNORDERED_NODE_ITERATOR_TYPE, MensBeam\HTML\DOM\XPathResult::ORDERED_NODE_SNAPSHOT_TYPE, or MensBeam\HTML\DOM\XPathResult::UNORDERED_NODE_SNAPSHOT_TYPE. This is not in the specification, but not being able to simply iterate over a result is absurd.

MensBeam\HTML\DOM\Inner\Document

This is the document object that is wrapped. There are a few things that are publicly available. This is only available in the public API so the class may be interfaced with other libraries which expect a \DOMDocument object such as MensBeam\Lit.

Properties

wrapperNode: A readonly property that returns the wrapper document for the document.

MensBeam\HTML\DOM\Inner\Document::getWrapperNode

Returns the wrapper node that corresponds to the provided inner node. If one does not exist it is created.

Limitations & Differences from Specification

The primary aim of this library is accuracy. However, due either to limitations imposed by PHP's DOM, by assumptions made by the specification that aren't applicable to a PHP library, or simply because of impracticality some changes have needed to be made. There appears to be a lot of deviations from the specification below, but this is simply an exhaustive list of details about the implementation with a few even explaining why we follow the specification instead of what browsers do.

  1. Any mention of scripting or anything necessary because of scripting (such as the ElementCreationOptions options dictionary on Document::createElement) will not be implemented.
  2. Due to a PHP bug which severely degrades performance with large documents and in consideration of existing PHP software and because of bizarre uncircumventable xmlns attribute bugs when the document is in the HTML namespace, HTML elements in HTML documents are placed in the null namespace internally rather than in the HTML namespace. However, externally they will be shown as having the HTML namespace. Even though null namespaced elements do not exist in the HTML specification one can create them using the DOM. However, in this implementation they will be treated as HTML namespaced elements due to the HTML namespace limitation.
  3. In the WHATWG HTML DOM extensions specification Document has named properties. In JavaScript one accesses them through either property notation (document.ook) or array notation (document['ook']). In PHP this is impractical because there's a differentation between the two notations. Instead, all named properties need to be accessed via array notation ($document['ook']).
  4. The specification is written entirely with browsers in mind and aren't concerned with the DOM's being used outside of the browser. In browser there is always a document created by parsing serialized markup, and the DOM spec always assumes such. This is impossible in the way this PHP library is intended to be used. The default when creating a new Document is to set its content type to "application/xml". This isn't ideal when creating an HTML document entirely through the DOM, so this implementation will instead default to "text/html" unless using XMLDocument.
  5. Again, because the specification assumes the implementation will be a browser, processing instructions are supposed to be parsed as comments. While it makes sense for a browser, this is impractical for a DOM library used outside of the browser where one may want to manipulate them; this library will instead preserve them when parsing a document but will convert them to comments when using Element::innerHTML.
  6. Per the specification an actual HTML document cannot be created outside of the parser itself unless created via DOMImplementation::createHTMLDocument. Also, per the spec DOMImplementation cannot be instantiated via its constructor. This would require in this library's use case first creating a document then creating an HTML document via the first document's implementation. This is impractical and stupid, so in this library (like PHP DOM itself) a DOMImplementation can be instantiated independent of a document.
  7. The specification shows Document as being able to be instantiated through its constructor and shows XMLDocument as inheriting from Document. In browsers XMLDocument cannot be instantiated through its constructor. We will follow the specification here and allow it.
  8. CDATA section nodes, text nodes, and document fragments per the specification can be instantiated by their constructors independent of the Document::createCDATASectionNode, Document::createTextNode, and Document::createDocumentFragment methods respectively. This is not possible currently with this library and probably never will be due to the difficulty of implementing it and the awkwardness of their being different from every other node type in this respect.
  9. As the DOM is presently specified, CDATA section nodes cannot be created on an HTML document. However, they can be created (and rightly so) on XML documents. The DOM, however, does not prohibit importing of CDATA section nodes into an HTML document and will be appended to the document as such. This appears to be a glaring omission by the maintainers of the specification. This library will allow importing of CDATA section nodes into HTML documents but will instead convert them to text nodes.
  10. This implementation will not implement the NodeIterator and TreeWalker APIs. They are horribly conceived and impractical APIs that few people actually use because it's literally easier and faster to write recursive loops to walk through the DOM than it is to use those APIs. Walking downward through the tree has been replaced with the ParentNode::walk generator, and walking through adjacent children and moonwalking up the DOM tree can be accomplished through simple while or do/while loops.
  11. All of the Range APIs will also not be implemented due to the sheer complexity of creating them in userland and how it adds undue difficulty to node manipulation in the "core" DOM. Numerous operations reference in excrutiating detail what to do with Ranges when manipulating nodes and would have to be added here to be compliant or mostly so -- slowing everything else down in the process on an already extremely front-heavy library.
  12. The DOMParser and XMLSerializer APIs will not be implemented because they are ridiculous and limited in their scope. For instance, DOMParser::parseFromString won't set a document's character set to anything but UTF-8. This library needs to be able to print to other encodings due to the nature of how it is used. Document::__construct will accept optional $source and $charset arguments, and there are both Document::load and Document::loadFile methods for loading DOM from a string or a file respectively.
  13. Aside from HTMLElement, HTMLPreElement, HTMLTemplateElement, HTMLUnknownElement, MathMLElement, and SVGElement none of the specific derived element classes (such as HTMLAnchorElement or SVGSVGElement) are implemented. The ones listed before are required for the element interface algorithm. The focus on this library will be on the core DOM before moving onto those -- if ever.
  14. This class is meant to be used with HTML, but it will work -MOSTLY- as needed work with XML. Loading of XML uses PHP DOM's XML parser which does not completely conform to the XML specification. Writing an actual conforming XML parser is outside of the scope of this library. One notable feature of this library which won't work per the XML specification are unicode characters in element names. XML allows for capital letters while HTML doesn't. This implementation's workaround (because PHP's DOM doesn't support unicode at all in element names) internally coerces all non-ascii characters to 'Uxxxx' which would be valid modern XML names. Something like a lookup table would be necessary for XML instead, but this isn't implemented and may not be because of complexity.
  15. While there is implementation of much of the XPath extensions, there will only be support for XPath 1.0 because that is all PHP DOM's XPath supports.
  16. This library's XPath API is -- like the rest of the library itself -- a wrapper that wraps PHP's implementation but instead works like the specification, so there is no need to manually register namespaces. Namespaces that are associated with prefixes will be looked up when evaluating the expression if a XPathNSResolver is specified. However, access to registering PHP functions for use within XPath isn't in the specification but is available through Document::registerXPathFunctions and XPathEvaluator::registerXPathFunctions.
  17. XPathEvaluatorBase::evaluate has a result argument where one provides it with an existing result object to use. I can't find any usable documentation on what this is supposed to do, and the specifications on it are vague. So, at present it does nothing until what it needs to do can be deduced.
  18. At present XPath expressions cannot select elements or attributes which use any valid non-ascii character. This is because those nodes are coerced internally to work within PHP's DOM which doesn't support those characters. This can be worked around by coercing names in XPath queries, but that can only be reliably accomplished through an XPath parser. Writing an entire XPath parser for what amounts to an edge case isn't desirable.
  19. The XPath API itself is an ill-conceived API that is entirely impractical to use because doing anything with the XPathResult object is cumbersome and stupid. Per the specification one cannot iterate over the result even if the result type is an iterator type (why in the hell call it that, then?). One has to instead repeatedly call the XPathResult::iterateNext() method. This implementation will allow for treating XPathResult snapshot or iterator types as arrays.

All versions of html-dom with dependencies

PHP Build Version
Package Version
Requires php Version >=8.0.2
ext-dom Version *
mensbeam/html-parser Version >=1.2.1
symfony/css-selector Version >=5.3
mensbeam/getters-and-setters Version >=1.1
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package mensbeam/html-dom contains the following files

Loading the files please wait ....