Download the PHP package remorhaz/php-unilex without Composer
On this page you can find all versions of the php package remorhaz/php-unilex. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download remorhaz/php-unilex
More information about remorhaz/php-unilex
Files in remorhaz/php-unilex
Package php-unilex
Short Description Unilex: lexical analyzer generator with Unicode support written in PHP
License MIT
Homepage https://github.com/remorhaz/php-unilex
Informations about the package php-unilex
UniLex
UniLex is lexical analyzer generator (similar to lex
and flex
) with Unicode support.
It's written in PHP and generates code in PHP.
Requirements
- PHP 8
License
UniLex library is licensed under MIT license.
Installation
Installation is as simple as any other composer library's one:
Usage
Quick start in example
Let's imagine we want to write a simple calculator and we need a lexer (lexical analyzer) that provides a stream of IDs, numbers and operators. Create a new Composer project and execute following command from project directory:
Next step is creating a lexer specification in LexerSpec.php
file. We use @lexToken
tag in comments to specify regular expression for a token:
Next step is building a token matcher from specification:
Now we have a compiled token matcher in TokenMatcher.php
file. Let's use it and read all tokens from the buffer:
On execution this script outputs:
Let's go a bit further and make it possible to retrieve text presentation of every token from input buffer. We need to modify a lexer specification to attach the result to each non-EOI token as an attribute:
After rebuilding token matcher with CLI utility we need to modify output cycle of our example program to make it print text with token IDs:
And now program prints:
CLI
You can use command-line utility to build token matcher from specification:
Specification
Specification is a PHP file that is split in parts by DocBlock comments with special tags. There is a special variable $context
that contains context object with \Remorhaz\UniLex\Lexer\TokenMatcherContextInterface
interface. Current implementation also uses int
variable $char
that contains current symbol (TODO: should be moved into context object).
@lexHeader
This block can contain namespace
and use
statements that will be used during matcher generation.
@lexBeforeMatch
This block is executed before the beginning of matching procedure and can be used to initialize some additional variables.
@lexOnTransition
This block is executed on each symbol matched by token's regular expression.
@lexToken /regexp/
This block is executed on matching given regular expression from the input buffer. Most commonly it just setups new token in context object.
@lexMode 'mode_name'
This tag tells parser that matching @lexToken
expression matches only if current lexical mode is mode_name
. Lexical mode can be switched with $context->setMode('mode_name')
method. Using lexical modes allows to have several "sub-grammars" in one specification (i. e. some tokens can be recognized only in comments or strings).
@lexOnError
This block is executed if matcher fails to match any of token's regular expressions. By default it just returns false
.
All versions of php-unilex with dependencies
phpdocumentor/reflection-docblock Version ^4.3 || ^5
nikic/php-parser Version ^4.12 || ^5
remorhaz/int-rangesets Version ^0.3
remorhaz/ucd Version ^0.3
symfony/console Version ^6.1 || ^7
thecodingmachine/safe Version ^1.3.1 || ^2