Download the PHP package prewk/xml-string-streamer without Composer
On this page you can find all versions of the php package prewk/xml-string-streamer. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download prewk/xml-string-streamer
More information about prewk/xml-string-streamer
Files in prewk/xml-string-streamer
Package xml-string-streamer
Short Description Stream large XML files with low memory consumption
License MIT
Homepage https://github.com/prewk/xml-string-streamer
Informations about the package xml-string-streamer
xml-string-streamer
Purpose
To stream XML files too big to fit into memory, with very low memory consumption. This library is a successor to XmlStreamer.
Installation
Legacy support
- All versions below 1 support PHP 5.3 - 7.2
- Version 1 and above support PHP 7.2+
With composer
Run composer require prewk/xml-string-streamer
to install this package.
Usage
Let's say you have a 2 GB XML file gigantic.xml containing customer items that look like this:
`
Create a streamer and parse it:
``
Without the convenience method (functionally equivalient):
``
Convenience method for the UniqueNode parser:
``
Parsers
Parser\StringWalker
Works like an XmlReader, and walks the XML tree node by node. Captures by node depth setting.
Parser\UniqueNode
A much faster parser that captures everything between a provided element's opening and closing tags. Special prerequisites apply.
Stream providers
Stream\File
Use this provider to parse large XML files on disk. Pick a chunk size, for example: 1024 bytes.
``
Stream\Stdin
Use this provider if you want to create a CLI application that streams large XML files through STDIN.
``
Stream\Guzzle
Use this provider if you want to stream over HTTP with Guzzle. Resides in its own repo due to its higher PHP version requirements (5.5): https://github.com/prewk/xml-string-streamer-guzzle
StringWalker Options
Usage
``
Available options for the StringWalker parser
Option | Default | Description |
---|---|---|
(int) captureDepth | 2 |
Depth we start collecting nodes at |
(array) tags | See example | Supported tags |
(bool) expectGT | false |
Whether to support > in XML comments/CDATA or not |
(array) tagsWithAllowedGT | See example | If expectGT is true , this option lists the tags with allowed > characters in them |
Examples
captureDepth
Default behavior with a capture depth of 2
:
`
..will capture the <capture-me>
nodes.
But say your XML looks like this:
Then you'll need to set the capture depth to
3to capture the
Node depth visualized:
`
tags
Default value:
``
First parameter: opening tag, second parameter: closing tag, third parameter: depth.
If you know that your XML doesn't have any XML comments, CDATA or self-closing tags, you can tune your performance by setting the tags option and omitting them:
``
expectGT & tagsWithAllowedGT
You can allow the >
character within XML comments and CDATA sections if you want. This is pretty uncommon, and therefore turned off by default for performance reasons.
Default value for tagsWithAllowedGT:
``
UniqueNode Options
Usage
``
Available options for the UniqueNode parser
Option | Description |
---|---|
(string) uniqueNode | Required option: Specify the node name to capture |
(bool) checkShortClosing | Whether to check short closing tag or not |
Examples
uniqueNode
Say you have an XML file like this:
You want to capture the stuff nodes, therefore set _uniqueNode_ to
"stuff"`.
If you have an XML file with short closing tags like this:
You want to capture the stuff nodes, therefore set _uniqueNode_ to
"stuff"and _checkShortClosing_ to
true`.
But if your XML file look like this: `
..you won't be able to use the UniqueNode parser, because <stuff>
exists inside of another <stuff>
node.
Advanced Usage
Progress bar
You can track progress using a closure as the third argument when constructing the stream class. Example with the File
stream using the StringWalker
parser:
``
You could of course do something more intelligent than spamming with echo
.
Accessing the root element (version 0.7.0+)
Setting the parser option extractContainer
tells the parser to gather everything before and after your intended child element capture. The results are available via the parser's getExtractedContainer()
method.
Note: getExtractedContainer()
will return different things depending on if you've streamed the whole file or not. If you need the containing XML data prematurely you can get it inside of the while loop, but it will just be the opening elements and therefore considered invalid XML by parsers such as SimpleXML.
``
This method should be considered experimental, and may extract weird stuff in edge cases