SPRDPL (pronounced "spur-DIPple") is a very simple framework for creating lexers and recursive descent parsers. It aims for brevity and readability, providing a reasonably-featured parser and lexer in about 500 lines of code (if whitespace and comments are excluded, just over 300 lines).
- Very simple, Pythonic code with pretty decent performance
- Pretty-ish error messages (through
ParseError.print
) - Self-modifying parsers/lexers
- Lazy parsing support for interactive use
Using sprdpl
looks a little something like this:
table = {
'PLUS': r'\+',
'MINUS': r'-',
'TIMES': r'\*',
'DIVIDE': r'/',
'POWER': r'\^',
'NUMBER': (r'[0-9]+(\.[0-9]*)?|\.[0-9]+',
lambda t: t.copy(value=num_type(t.value))),
'LPAREN': r'\(',
'RPAREN': r'\)',
'WHITESPACE': (r'[ \t\n]+', lambda t: None),
}
lexer = lex.Lexer(table)
rules = [
['atom', 'NUMBER', ('LPAREN expr RPAREN', lambda p: p[1])],
['factor', ('atom POWER factor', lambda p: p[0] ** p[2]),
'atom', ('MINUS factor', lambda p: -p[1])],
['term', ('factor ((TIMES|DIVIDE) factor)*', reduce_binop)],
['expr', ('term ((PLUS|MINUS) term)*', reduce_binop)],
]
parser = parse.Parser(rules, 'expr')
result = parser.parse(lexer.input(line))
A very simple calculator example (from which this code is taken) is provided in example.py
.
The lexer is constructed with a list of token name/regular expression pairs, specified
with Python's regex syntax. A transformation function can optionally be provided, allowing
the token to hold any Python value (like the NUMBER
example above).
The parser works by translating an EBNF-like grammar directly into a recursive descent parser.
More documentation coming soon (maybe).
Haha, that's a good one! Right now, in the interests of laziness/simplicity, there are basically no sanity checks on the soundness of your grammar (or the soundness of this library). You get what you pay for I guess.
Ambiguity is handled more-or-less on a greedy basis, like a recursive descent parser would: each
of the possibilities in an alternation (i.e. rule_1 | rule_2
) are tried in series, with backtracking
in case the rule didn't parse. Right now the lexer/parser keep all input in memory to support infinite
backtracking.
There's no solid benchmarks at the moment. There are only a few known uses of this library, all written by me.
The best example of a reasonably complicated parser right now is my semi-dormant programming language Mutagen. As a small anecdotal data point it was faster when it originally replaced PLY for the aforementioned Mutagen parser. I don't remember how much faster, sorry.