Download the PHP package nette/tokenizer without Composer
On this page you can find all versions of the php package nette/tokenizer. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download nette/tokenizer
More information about nette/tokenizer
Files in nette/tokenizer
Package tokenizer
Short Description Nette Tokenizer
License BSD-3-Clause GPL-2.0-only GPL-3.0-only
Homepage https://nette.org
Informations about the package tokenizer
Nette Tokenizer [DISCONTINUED]
Introduction
Tokenizer is a tool that uses regular expressions to split given string into tokens. What the hell is that good for, you might ask? Well, you can create your own languages!
Documentation can be found on the website. If you like it, please make a donation now. Thank you!
Installation:
It requires PHP version 7.1 and supports PHP up to 8.1.
Support Me
Do you like Nette Tokenizer? Are you looking forward to the new features?
Thank you!
Usage
Let's create a simple tokenizer that separates strings to numbers, whitespaces, and letters.
Hint: In case you are wondering where the T_ constants come from, they are internal type used for parsing code. They cover most of the common token names we usually need. Keep in mind their value is not guaranteed so don't use numbers for comparison.
Now when we give it a string, it will return stream Nette\Tokenizer\Stream of tokens Nette\Tokenizer\Token.
The resulting array of tokens $stream->tokens
would look like this.
Also, you can access the individual properties of token:
Simple, isn't it?
Processing the tokens
Now we know how to create tokens from string. Let's effectively process them using Nette\Tokenizer\Stream
. It has a lot of really awesome methods if you need to traverse tokens!
Let's try to parse a simple annotation from PHPDoc and create an object from it. What regular expressions do we need for tokens? All the annotations start with @
, then there is a name, whitespace and it's value.
@
for the annotation start\s+
for whitespaces\w+
for strings
(Never use capturing subpatterns in Tokenizer's regular expressions like '(ab)+c'
, use only non-capturing ones '(?:ab)+c'
.)
This should work on simple annotations, right? Now let's show input string that we will try to parse.
Let's create a Parser
class that will accept the string and return an array of pairs [name, value]
. It will be very naive and simple.
So what the parse()
method does? It iterates over the tokens and searches for @
which is the symbol annotations start with. Calling nextToken()
moves the cursor to the next token. Method isCurrent()
checks if the current token at the cursor is the given type. Then, if the @
is found, the parse()
method calls parseAnnotation()
which expects the annotations to be in a very speficic format.
First, using the method joinUntil()
, the stream keeps moving the cursor and appending the values of the tokens to the buffer until it finds token of the required type, then stops and returns the buffer output. Because there is only one token of type T_STRING
at that given position and it's 'name'
, there will be value 'name'
in variable $name
.
Method nextUntil()
is similar like joinUntil()
but it has no buffer. It only moves the cursor until it finds the token. So this call simply skips all the whitespaces after the annotation name.
And then, there is another joinUntil()
, that searches for next @
. This specific call will return "David Grudl\n "
.
And there we go, we've parsed one whole annotation! The $content
probably ends with whitespaces, so we have to trim it. Now we can return this specific annotation as pair [$name, $content]
.
Try copypasting the code and running it. If you dump the $annotations
variable it should return some similar output.
Stream methods
The stream can return current token using method currentToken()
or only it's value using currentValue()
.
nextToken()
moves the cursor and returns the token. If you give it no arguments, it simply returns the next token.
nextValue()
is just like nextToken()
but it only returns the token value.
Most of the methods also accept multiple arguments so you can search for multiple types at once.
You can also search by the token value.
nextUntil()
moves the cursor and returns the an array of all the tokens it sees until it finds the desired token, but it stops before the token. It can accept multiple arguments.
joinUntil()
is similar to nextUntil()
, but concatenates all the tokens it passed through and returns string.
joinAll()
simply concatenates all the remaining token values and returns it. It moves the cursor to the end of the token stream
nextAll()
is just like joinAll()
, but it returns array of the tokens.
isCurrent()
checks if the current token or the current token's value is equal to one of the given arguments.
isNext()
is just like isCurrent()
but it checks the next token.
isPrev()
is just like isCurrent()
but it checks the previous token.
And the last method reset()
resets the cursor, so you can iterate the token stream again.