Download the PHP package ryangjchandler/lexical without Composer
On this page you can find all versions of the php package ryangjchandler/lexical. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download ryangjchandler/lexical
More information about ryangjchandler/lexical
Files in ryangjchandler/lexical
Package lexical
Short Description Attribute-driven tokenisation for PHP.
License MIT
Homepage https://github.com/ryangjchandler/lexical
Informations about the package lexical
Lexical
Installation
You can install the package via Composer:
Usage
Let's write a simple lexer for mathematical expressions. The expressions can contain numbers (only integers) and a handful of operators (+
, -
, *
, /
).
Begin by creating a new enumeration that describes the token types.
Lexical provides a set of attributes that can be added to each case in an enumeration:
Regex
- accepts a single regular expression.Literal
- accepts a string of continuous characters.Error
- designates a specific enumeration case as the "error" type.
Using those attributes with TokenType
looks like this.
With the attributes in place, we can start to build a lexer using the LexicalBuilder
.
The readTokenTypesFrom()
method is used to tell the builder where we should look for the various tokenising attributes. The build()
method will take those attributes and return an object that implements LexerInterface
, configured to look for the specified token types.
Then it's just a case of calling the tokenise()
method on the lexer object to retrieve an array of tokens.
The tokenise()
method returns a list of tuples, where the first item is the "type" (TokenType
in this example), the second item is the "literal" (a string containing the matched characters) and the third item is the "span" of the token (the start and end positions in the original string).
Skipping whitespace and other patterns
Continuing with the example of a mathematical expression, the lexer currently understands 1+2
but it would fail to tokenise 1 + 2
(added whitespace). This is because by default it expects each and every possible character to fall into a pattern.
The whitespace is insignificant in this case, so can be skipped safely. To do this, we need to add a new Lexer
attribute to the TokenType
enumeration and pass through a regular expression that matches the characters we want to skip.
Now the lexer will skip over any whitespace characters and successfully tokenise 1 + 2
.
Error handling
When a lexer encounters an unexpected character, it will throw an UnexpectedCharacterException
.
As mentioned above, there is an Error
attribute that can be used to mark an enum case as the "error" type.
Now when the input is tokenised, the unrecognised character will be consumed like other tokens and will have a type of TokenType::Error
.
Custom Token
objects
If you prefer to work with dedicated objects instead of Lexical's default tuple values for each token, you can provide a custom callback to map the matched token type and literal into a custom object.
Testing
Changelog
Please see CHANGELOG for more information on what has changed recently.
Contributing
Please see CONTRIBUTING for details.
Security Vulnerabilities
Please review our security policy on how to report security vulnerabilities.
Credits
- Ryan Chandler
- All Contributors
License
The MIT License (MIT). Please see License File for more information.