PEGParser is a PEG Parser for Julia with Packrat capabilties. PEGParser was inspired by pyparsing, parsimonious, boost::spirit, as well as several others. I was originally writing the EBNF for an entirely different purpose, when it ocurred to me that it wouldn't be too difficult to write a parser. Thus, PEGParsing was born.
To define a grammar you can write:
@grammar <name> begin
rule1 = ...
rule2 = ...
...
end
The following rules can be used:
- Terminals: Strings and characters
- Or:
a | b | c
- And:
a + b + c
- Grouping:
(a + b) | (c + d)
- Optional: ?(a + b)
- One or more:
+((a + b) | (c + d))
- Zero or more:
*((a + b) | (c + d))
- Regular expressions:
r"[a-zA-Z]+"
Multiple: (a+b)^(3, 5)
Suppose you want a parser that takes input and converts [text]
into <b>text<>
. You can write the following grammar:
@grammar markup begin
# this is the standard start rule
start = bold_text
# compose a sequence
bold_text = bold_open + text + bold_code
# use a regular expression to define the text
text = r"[a-zA-z]"
bold_open = '['
bold_close = ']'
end
The first step in using the grammar is to create an AST from a given input:
(node, pos, error) = parse(markup, "[test]")
The variable node
contains the AST which can be transformed to the desired result. To do so, first a mapping of the node names to transform has to established:
html = Dict()
html["bold_open"] = (node, children) -> "<b>"
html["bold_close"] = (node, children) -> "</b>"
html["text"] = (node, children) -> node.value
html["bold_text"] = (node, children) -> join(children)
And finally:
result = transform(html, node)
println(result) # "<b>test</b>"
Transforms can also be used to calculate a value from the tree. Consider the standard calculator app:
@grammar calc begin
start = expr
number = r"([0-9]+)"
expr = (term + op1 + expr) | term
term = (factor + op2 + term) | factor
factor = number | pfactor
pfactor = ('(' + expr + ')')
op1 = '+' | '-'
op2 = '*' | '/'
end
And to use the grammar:
(node, pos, error) = parse(grammar, "5*(42+3+6+10+2)")
math = Dict()
math["number"] = (node, children) -> float(node.value)
math["expr"] = (node, children) ->
length(children) == 1 ? children : eval(Expr(:call, children[2], children[1], children[3]))
math["factor"] = (node, children) -> children
math["pfactor"] = (node, children) -> children[2]
math["term"] = (node, children) ->
length(children) == 1 ? children : eval(Expr(:call, children[2], children[1], children[3]))
math["op1"] = (node, children) -> symbol(node.value)
math["op2"] = (node, children) -> symbol(node.value)
result = transform(math, node)
println(result) # 315.0
This code is not very well tested yet and isn't yet finished, so use at your own risk (I wrote this while on vacation). It is, without any question, a work in progress.
While I have not benchmarked any of this, the hope is that it should be relatively fast. The grammar itself is created using macros. The parsing has very little overhead and additionally has caching. That said, without any benchmarking, this is just conjecture.
Additionally, little to now error checking is performed. The macros probably need some serious review. And while there are a bunch of unit tests, they fall very short from full coverage.
Finally, one thing that I would like to change in the near future is to have transforms look something like:
html(node, children, :bold_open) = "<b>"
html(node, children, :bold_close) = "</b>"
html(node, children, :text) = node.value
html(node, children, :bold_text) = join(children)
result = transform(html, node)
If Julia gets dispatch on value, then this would be trivial to write. One possible workaround is to create a type per rule in the grammar. Then the functions can be written to dispatch on the type associated with the given rule.