Download the PHP package bakame/html-table without Composer
On this page you can find all versions of the php package bakame/html-table. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package html-table
HTML Table
bakame/html-table
is a small PHP package that allows you to parse, import and manipualte
tabular data represented as HTML Table. Once installed you will be able to do the following:
System Requirements
league\csv >= 9.11.0 library is required.
Installation
Use composer:
Documentation
The Parser
can convert a file (a PHP stream or a Path with an optional context like fopen
)
or an HTML document into a League\Csv\TabularData
implementing object. Once converted you
can use all the methods and feature made available by the interface (see ResultSet)
for more information.
The Parser
itself is immutable, whenever you change a configuration option a new instance is returned.
The Parser
constructor is private to instantiate the object you are required to use the new
method instead
parseHtml and parseFile
To extract and parse your table use either the parseHtml
or parseFile
methods.
If parsing is not possible a ParseError
exception will be thrown.
parseHtml
parses an HTML page represented by:
- a
string
, - a
Stringable
object, - a
DOMDocument
, - a
DOMElement
, - or a
SimpleXMLElement
whereas parseFile
works with:
- a filepath,
- or a PHP readable stream.
Both methods return a Table
instance which implements the League\Csv\TabularDataReader
interface and also give access to the table caption if present via the getCaption
method.
Default configuration
By default, when calling the Parser::new()
named constructor the parser will:
- try to parse the first table found in the page
- expect the table header row to be the first
tr
found in thethead
section of your table - exclude the table
thead
section when extracting the table content. - ignore XML errors.
- have no formatter attached.
- have no default caption to used if none is present in the table.
Each of the following settings can be changed to improve the conversion against your business rules:
tablePosition and tableXpathPosition
Selecting the table to parse in the HTML page can be done using two (2) methods
Parser::tablePosition
and Parser::tableXpathPosition
If you know the table position in the page in relation with its integer offset or if
you know it's id
attribute value you should use Parser::tablePosition
otherwise
favor Parser::tableXpathPosition
which expects an xpath
expression.
If the expression is valid, and a list of table is found, the first result will be returned.
Parser::tableXpathPosition
and Parser::tablePosition
override each other. It is
recommended to use one or the other but not both at the same time.
tableCaption
You can optionally define a caption for your table if none is present or found during parsing.
tableHeader, tableHeaderPosition, ignoreTableHeader and resolveTableHeader
The following settings configure the Parser
in relation to the table header. By default,
the parser will try to parse the first tr
tag found in the thead
section of the table.
But you can override this behaviour using one of these settings:
tableHeaderPosition
Tells where to locate and resolve the table header
The method uses the Bakame\HtmlTable\Section
enum to designate which table section to use
to resolve the header
If Section::tr
is used, tr
tags will be used independently of their section.
The second argument is the table header tr
offset; it defaults to 0
(ie: the first row).
ignoreTableHeader and resolveTableHeader
Instructs the parser to resolve or not the table header using tableHeaderPosition
configuration.
If no resolution is done, no header will be included in the returned Table
instance.
tableHeader
You can specify directly the header of your table and override any other table header related configuration with this configuration
If you specify a non-empty array as the table header, it will take precedence over any other table header related options.
Because it is a tabular data each cell MUST be unique otherwise an exception will be thrown
You can skip or re-arrange the source columns by skipping them by their offsets and/or by re-ordering the offsets.
includeSection and excludeSection
Tells which section should be parsed based on the Section
enum
By default, the thead
section is not parse. If a thead
row is selected to be the header, it will
be parsed independently of this setting.
⚠️Tips: to be sure of which sections will be modified, first remove all previous setting before applying your configuration as shown below:
The first call will still include the tfoot
and the tr
sections, whereas the second call
remove any previous setting guaranting that only the tbody
if present will be parsed.
withFormatter and withoutFormatter
Adds or remove a record formatter applied to the data extracted from the table before you can access it. The header is not affected by the formatter if it is defined.
The formatter closure signature should be:
If a header was defined or specified, the submitted record will have the header definition set, otherwise an array list is provided.
The following formatter will work on any table content as long as it is defined as a string.
The following formatter will only work if the table has a header attached to it with
a column named count
.
ignoreXmlErrors and failOnXmlErrors
Tells whether the parser should ignore or throw in case of malformed HTML content.
Testing
The library:
- has a PHPUnit test suite
- has a coding style compliance test suite using PHP CS Fixer.
- has a code analysis compliance test suite using PHPStan.
To run the tests, run the following command from the project folder.
Security
If you discover any security related issues, please email [email protected] instead of using the issue tracker.
Credits
License
The MIT License (MIT). Please see License File for more information.
All versions of html-table with dependencies
ext-libxml Version *
ext-mbstring Version *
ext-simplexml Version *
league/csv Version ^9.11.0