Download the PHP package pietercolpaert/hardf without Composer
On this page you can find all versions of the php package pietercolpaert/hardf. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download pietercolpaert/hardf
More information about pietercolpaert/hardf
Files in pietercolpaert/hardf
Package hardf
Short Description A fast parser for RDF serializations such as turtle, n-triples, n-quads, trig and N3
License MIT
Homepage https://github.com/pietercolpaert/hardf
Informations about the package hardf
The hardf turtle, n-triples, n-quads, TriG and N3 parser for PHP
hardf is a PHP 7.1+ library that lets you handle Linked Data (RDF). It offers:
- Notation3 (N3)
- N-Quads
Both the parser as the serializer have streaming support.
This library is a port of N3.js to PHP
Triple Representation
We use the triple representation in PHP ported from NodeJS N3.js library. Check https://github.com/rdfjs/N3.js/tree/v0.10.0#triple-representation for more information
On purpose, we focused on performance, and not on developer friendliness. We have thus implemented this triple representation using associative arrays rather than PHP object. Thus, the same that holds for N3.js, is now an array. E.g.:
Encode literals as follows (similar to N3.js)
Library functions
Install this library using composer:
Writing
A class that should be instantiated and can write TriG or Turtle
Example use:
All methods
Parsing
Next to TriG, the TriGParser class also parses Turtle, N-Triples, N-Quads and the W3C Team Submission N3
All methods
Basic examples for small files
Using return values and passing these to a writer:
Using callbacks and passing these to a writer:
Example using chunks and keeping prefixes
When you need to parse a large file, you will need to parse only chunks and already process them. You can do that as follows:
Parser options
format
input format (case-insensitive)blankNodePrefix
(defaults tob0_
) prefix forced on blank node names, e.g.TriGWriter(["blankNodePrefix" => 'foo'])
will parse_:bar
as_:foobar
.documentIRI
sets the base URI used to resolve relative URIs (not applicable ifformat
indicates n-triples or n-quads)lexer
allows usage of own lexer class. A lexer must provide following public methods:tokenize(string $input, bool $finalize = true): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
tokenizeChunk(string $input): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
end(): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
explicitQuantifiers
- [...]
Empty document base IRI
Some Turtle and N3 documents may use relative-to-the-base-IRI IRI syntax (see here and here), e.g.
To properly parse such documents the document base IRI must be known. Otherwise we might end up with empty IRIs (e.g. for the subject in the example above).
Sometimes the base IRI is encoded in the document, e.g.
but sometimes it is missing.
In such a case the Turtle specification requires us to follow section 5.1.1 of the RFC3986 which says that if the base IRI is not encapsulated in the document, it should be assumed to be the document retrieval URI (e.g. the URL you downloaded the document from or a file path converted to an URL). Unfortunatelly this can not be guessed by the hardf parser and has to be provided by you using the documentIRI
parser creation option, e.g.
Long story short if you run into the subject/predicate/object on line X can not be parsed without knowing the the document base IRI.(...)
error, please initialize the parser with the documentIRI
option.
Utility
A static class with a couple of helpful functions for handling our specific triple representation. It will help you to create and evaluate literals, IRIs, and expand prefixes.
See the documentation at https://github.com/RubenVerborgh/N3.js#utility for more information.
Two executables
We also offer 2 simple tools in bin/
as an example implementation: one validator and one translator. Try for example:
Performance
We compared the performance on two turtle files, and parsed it with the EasyRDF library in PHP, the N3.js library for NodeJS and with Hardf. These were the results:
#triples | framework | time (ms) | memory (MB) |
---|---|---|---|
1,866 | Hardf without opcache | 27.6 | 0.722 |
1,866 | Hardf with opcache | 24.5 | 0.380 |
1,866 | EasyRDF without opcache | 5,166.5 | 2.772 |
1,866 | EasyRDF with opcache | 5,176.2 | 2.421 |
1,866 | ARC2 with opcache | 71.9 | 1.966 |
1,866 | N3.js | 24.0 | 28.xxx |
3,896,560 | Hardf without opcache | 40,017.7 | 0.722 |
3,896,560 | Hardf with opcache | 33,155.3 | 0.380 |
3,896,560 | N3.js | 7,004.0 | 59.xxx |
3,896,560 | ARC2 with opcache | 203,152.6 | 3,570.808 |
License, status and contributions
The hardf library is copyrighted by Ruben Verborgh and Pieter Colpaert and released under the MIT License.
Contributions are welcome, and bug reports or pull requests are always helpful. If you plan to implement a larger feature, it's best to discuss this first by filing an issue.