/Gratt

A Generic Vaughn Pratt's top-down operator precedence parser for .NET Standard

Primary LanguageC#Apache License 2.0Apache-2.0

Gratt

Gratt is a generic implementation of a Pratt parser in C# for .NET Standard.

A Pratt parser is the affectionate name for a simple parsing technique presented by Vaughn Pratt in his 1973 paper “Top down operator precedence” (TDOP).

Gratt was inspired by the design of the Java implementation presented by Bob Nystrom in his journal entry “Pratt Parsers: Expression Parsing Made Easy”. The actual inspiration was the simplification and separation of interfaces in terms of prefix (nud) and infix (led) parselets (rather than the full Java implementation):

interface PrefixParselet {
  Expression parse(Parser parser, Token token);
}

interface InfixParselet {
  Expression parse(Parser parser, Expression left, Token token);
}

The unit tests for Gratt re-create and test the parser for the toy language Bantam discussed in Bob's journal entry.

The unit tests also contain a complete implementation for parsing and evaluating C# pre-processing expressions used in conditional directives #if and #elif.

Unlike Bob's Java implementation, Gratt's parser is completely generic:

class Parser<TState, TKind, TToken, TPrecedence, TResult>
{
    // ...
}

It can therefore work with any model of tokens (TToken), token types (TKind), precedence (TPrecedence) and result (TResult). It can also hold any user-defined state (TState) that may be needed during parsing although it is not required. An overload without any user-state simply assumes a unit.

Unlike Bob's Java implementation, Gratt does not define any interfaces to represent (prefix or infix) parselets. Instead, it relies on them being simple functions:

// version 2.0+

public static TResult
    Parse<TState, TKind, TToken, TPrecedence, TResult>(
        TState state,
        TPrecedence initialPrecedence, IComparer<TPrecedence> precedenceComparer,
        IEqualityComparer<TKind> kindEqualityComparer,
        TKind eoi, Func<TToken, Exception> eoiErrorSelector,
        Func<TKind, TToken, TState, Func<TToken, Parser<TState, TKind, TToken, TPrecedence, TResult>, TResult>> prefixFunction,
        Func<TKind, TToken, TState, (TPrecedence, Func<TToken, TResult, Parser<TState, TKind, TToken, TPrecedence, TResult>, TResult>)?> infixFunction,
        IEnumerable<(TKind, TToken)> lexer)

The above might read a little dense so below is a brief explanation of each parameter in order:

  • a user-defined state that is associated with the parser object during parsing
  • the initial precedence (this is usually 0 if TPrecedence is an int)
  • a precedence comparer
  • an equality comparer that can compare two token kinds
  • a token kind marking end of input (EOI)
  • a function used to project an Exception when the EOI token is not the the last token of the tokens sequence
  • a prefix function
  • an infix function
  • a sequence of token kind and token pairs (2-tuple) yielded by a lexer implementation

A prefix function receives a token kind, a token and a user-defined state as input and it must return a parsing function if the token kind has prefix parsing semantics. If the token kind is invalid as a prefix, then the token and user-defined state may be used for the purpose of providing details in a thrown exception (e.g., while TKind may be a simple enum type, TToken will usually be a rich object containing the position of the token in the source text among other data). The parsing function returned by the prefix function then receives the token and the parser as arguments and produces some result (TResult).

An infix function receives a token kind, a token and a user-defined state as input as arguments and optionally returns a precedence (left binding power) and parsing function pair. The parsing function then receives the token, the left result and the parser as arguments and produces some result (TResult).