/parser-utils

A library for writing recursive descent parsers in PHP.

Primary LanguagePHPMIT LicenseMIT

A library for writing recursive descent parsers in PHP.

Build Status

requires

  • PHP >= 7.1

Installation

The preferred installation method is composer:

composer require yosymfony/parser-utils

An example

First, you need to create a lexer. This one will recognize tokens

use Yosymfony\ParserUtils\BasicLexer;

$lexer = new BasicLexer([
    '/^([0-9]+)/x' => 'T_NUMBER',
    '/^(\+)/x' => 'T_PLUS',
    '/^(-)/x' => 'T_MINUS',
    '/^\s+/' => 'T_SPACE',  // We do not surround it with parentheses because
                            // this is not meaningful for us in this case
]);

Second, you need a parser for consuming the tokens provided by the lexer. The AbstractParser class contains an abstract method called parseImplementation that receives a TokenStream as an argument.

use Yosymfony\ParserUtils\AbstractParser;

class Parser extends AbstractParser
{
    protected function parseImplementation(TokenStream $stream)
    {
        $result = $stream->matchNext('T_NUMBER');

        while ($stream->isNextAny(['T_PLUS', 'T_MINUS'])) {
            switch ($stream->moveNext()->getName()) {
                case 'T_PLUS':
                    $result += $stream->matchNext('T_NUMBER');
                    break;
                case 'T_MINUS':
                    $result -= $stream->matchNext('T_NUMBER');
                    break;
                default:
                    throw new SyntaxErrorException("Something went wrong");
                    break;
            }
        }

        return $result;
    }
}

Now, you can see the results:

$parser = new Parser($lexer);
$parser->parse('1 + 1');          // 2

The BasicLexer class

The lexer has the responsibility of recognizing tokens. This one works line by line. If you want to generate an special T_NEWLINE token for each line of the input, call $lexer->generateNewlineTokens() before tokenizing. You can set the name of this special token using the method setNewlineTokenName.

$lexer = new BasicLexer([...]);
$lexer->generateNewlineTokens()
      ->setNewlineTokenName('T_NL');

$lexer->tokenize('...');

Additionally, there is another special token T_EOS that determines the end of the input string. To enable this feature call $lexer->generateEosToken() before tokenizing. You can set the name of this special token using the method setEosTokenName.

$lexer = new BasicLexer([...]);
$lexer->generateEosToken()
      ->setEosTokenName('T_MY_EOS');

$lexer->tokenize('...');

The TokenStream class

This class let you treat with the list of tokens returned by the lexer.

  • moveNext: Moves the pointer one token forward. Returns a Token object or null if there are not more tokens. e.g: $ts->moveNext().

  • matchNext: Matches the next token and returns its value. This method moves the pointer one token forward. It will throw an SyntaxErrorException exception if the next token does not match. e.g: $number = $ts->matchNext('T_NUMBER').

  • isNext: Checks if the next token matches with the token name passed as argument. e.g: $ts->isNext('T_PLUS') // true or false.

  • skipWhile: Skips tokens while they match with the token name passed as argument. This method moves the pointer "n" tokens forward until the last one that match with the token name. e.g: $ts->skipWhile('T_PLUS')

  • skipWhileAny: Skips tokens while they match with one of the token names passed as argument. This method moves the pointer "n" tokens forward until the last one that match with one of the token names e.g: $ts->skipWhileAny(['T_PLUS', 'T_MINUS'])

  • isNextSequence: Checks if the following tokens in the stream match with the sequence of tokens. e.g: $ts->isNextSequence(['T_NUMBER', 'T_PLUS', 'T_NUMBER']) // true or false.

  • isNextAny: Checks if one of the tokens passed as argument is the next token. e.g: $fs->isNextAny(['T_PLUS', 'T_SUB']) // true or false

  • hasPendingTokens: Has pending tokens? e.g: $fs->hasPendingTokens() // true or false.

  • reset: Resets the stream to the beginning.

Tokens

Tokens are instances of Token class, a class than contains the following methods:

  • getName: returns the name of the toke. e.g: T_SUM.
  • getValue: returns the value of the token.
  • getLine: returns the line in where the token is found.

Unit tests

You can run the unit tests with the following command:

$ cd parser-utils
$ composer test

License

This library is open-sourced software licensed under the MIT license.