A library for writing recursive descent parsers in PHP.
- PHP >= 7.1
The preferred installation method is composer:
composer require yosymfony/parser-utils
First, you need to create a lexer. This one will recognize tokens
use Yosymfony\ParserUtils\BasicLexer;
$lexer = new BasicLexer([
'/^([0-9]+)/x' => 'T_NUMBER',
'/^(\+)/x' => 'T_PLUS',
'/^(-)/x' => 'T_MINUS',
'/^\s+/' => 'T_SPACE', // We do not surround it with parentheses because
// this is not meaningful for us in this case
]);
Second, you need a parser for consuming the tokens provided by the lexer.
The AbstractParser
class contains an abstract method called parseImplementation
that receives a TokenStream
as an argument.
use Yosymfony\ParserUtils\AbstractParser;
class Parser extends AbstractParser
{
protected function parseImplementation(TokenStream $stream)
{
$result = $stream->matchNext('T_NUMBER');
while ($stream->isNextAny(['T_PLUS', 'T_MINUS'])) {
switch ($stream->moveNext()->getName()) {
case 'T_PLUS':
$result += $stream->matchNext('T_NUMBER');
break;
case 'T_MINUS':
$result -= $stream->matchNext('T_NUMBER');
break;
default:
throw new SyntaxErrorException("Something went wrong");
break;
}
}
return $result;
}
}
Now, you can see the results:
$parser = new Parser($lexer);
$parser->parse('1 + 1'); // 2
The lexer has the responsibility of recognizing tokens. This one works line by
line. If you want to generate an special T_NEWLINE
token for each line
of the input, call $lexer->generateNewlineTokens()
before tokenizing. You can set the
name of this special token using the method setNewlineTokenName
.
$lexer = new BasicLexer([...]);
$lexer->generateNewlineTokens()
->setNewlineTokenName('T_NL');
$lexer->tokenize('...');
Additionally, there is another special token T_EOS
that determines the end of the input
string. To enable this feature call $lexer->generateEosToken()
before tokenizing.
You can set the name of this special token using the method setEosTokenName
.
$lexer = new BasicLexer([...]);
$lexer->generateEosToken()
->setEosTokenName('T_MY_EOS');
$lexer->tokenize('...');
This class let you treat with the list of tokens returned by the lexer.
-
moveNext: Moves the pointer one token forward. Returns a
Token
object ornull
if there are not more tokens. e.g:$ts->moveNext()
. -
matchNext: Matches the next token and returns its value. This method moves the pointer one token forward. It will throw an
SyntaxErrorException
exception if the next token does not match. e.g:$number = $ts->matchNext('T_NUMBER')
. -
isNext: Checks if the next token matches with the token name passed as argument. e.g:
$ts->isNext('T_PLUS') // true or false
. -
skipWhile: Skips tokens while they match with the token name passed as argument. This method moves the pointer "n" tokens forward until the last one that match with the token name. e.g:
$ts->skipWhile('T_PLUS')
-
skipWhileAny: Skips tokens while they match with one of the token names passed as argument. This method moves the pointer "n" tokens forward until the last one that match with one of the token names e.g:
$ts->skipWhileAny(['T_PLUS', 'T_MINUS'])
-
isNextSequence: Checks if the following tokens in the stream match with the sequence of tokens. e.g:
$ts->isNextSequence(['T_NUMBER', 'T_PLUS', 'T_NUMBER']) // true or false
. -
isNextAny: Checks if one of the tokens passed as argument is the next token. e.g:
$fs->isNextAny(['T_PLUS', 'T_SUB']) // true or false
-
hasPendingTokens: Has pending tokens? e.g:
$fs->hasPendingTokens() // true or false
. -
reset: Resets the stream to the beginning.
Tokens are instances of Token
class, a class than contains the following methods:
- getName: returns the name of the toke. e.g:
T_SUM
. - getValue: returns the value of the token.
- getLine: returns the line in where the token is found.
You can run the unit tests with the following command:
$ cd parser-utils
$ composer test
This library is open-sourced software licensed under the MIT license.