Parser combinator library in Golang.
A library to construct top-down recursive backtracking parsers using parser-combinators. To know more about theory of parser combinators, refer to : http://en.wikipedia.org/wiki/Parser_combinator
This package contains following components,
- standard set of combinators, parsing algorithms can create input specific combinator functions on top of the standard set.
- regular expression based scanner.
- standard set of terminal tokens.
API reference: https://godoc.org/github.com/prataprc/goparsec
combinators
Every combinator should confirm to the following signature,
// ParsecNode type defines a node in the AST
type ParsecNode interface{}
// Parser function parses input text, higher order parsers are
// constructed using combinators.
type Parser func(Scanner) (ParsecNode, Scanner)
// Nodify callback function to construct custom ParsecNode.
type Nodify func([]ParsecNode) ParsecNode
combinators take a variable number of parser functions and return a new parser function.
Nodify is user supplied callback function, called by combinator when the rule matches current input text. The callback function receives a collection of parsecNode interface generated by parsers.
A Parser function must take a new copy of Scanner instance, with current cursor, as argument, and consume the scanner as long as there is a match. If the parser could not match with input text it returns the scanner without consuming any input. For example, the following snippet is from And combinator,
var ns = make([]ParsecNode, 0, len(parsers))
var n ParsecNode
news := s.Clone()
for _, parser := range parsers {
n, news = doParse(parser, news)
if n == nil {
return nil, s
}
ns = append(ns, n)
}
return docallback(callb, ns), news
we can see that a new instance of s is passed to each parser and when one of the parser returns failure (where n==nil), it simply returns the scanner without consuming any tokens. Otherwise, it returns the new-scanner news returned by the last parser.
List of combinators
And
accepts a collection of parsers that must sequentially match current input text, combinator fails when any of the parser fails to match.
nodify callback is called with a slice of ParsecNodes obtained from each parser, otherwise callback is ignored.
OrdChoice
accepts a collection of parsers, where atleast one of the parser should match current input text, combinator fails when all the parser fail to match.
nodify callback is called with a slice of single parsecNode element when one of the input parser matches with input text, otherwise callback is ignored.
when a parser matches the input text remaining parsers are not tried.
Kleene
accepts a pair of parser, where the first element must match zero or more times with current input text and the second optional element acts as token separator, kleene combinator will exit when the first parser or the second parser, if specified, fails.
nodify callback is called with a slice of ParsecNodes obtained from every match of the first parser, otherwise called with empty-slice of ParsecNodes.
Many
accepts a pair of parser, where the first element must match one or more times with current input text and the second optional element acts as token separator. Note that the Many repeatition will exit when first parser or second parser, if specified, fails.
nodify callback is called with a slice of ParsecNodes obtained from every match of the first parser, otherwise callback is ignored.
Maybe
accepts a parser that can either match or does-not-match with current input text.
nodify callback is called with a slice of single parsecNode element if Maybe succeeds, otherwise callback is ignored.
using the builtin scanner
The builtin scanner library manages the input buffer and a cursor into the buffer. Create a new scanner instance,
s := parsec.NewScanner(text)
the scanner library supplies method receivers like Match(), SkipWS() and Endof(). refer to scanner.go for more information on each of these methods.
Examples
- expr/expr.go, implements a parsec grammer to parse arithmetic expressions.
- json/json.go, implements a parsec grammer to parse JSON document.
clone the repository run the benchmark suite
$ cd expr/
$ go test -test.bench=. -test.benchmem=true
$ cd json/
$ go test -test.bench=. -test.benchmem=true
to run the example program,
# to parse expression
$ go run tools/parsec/parsec.go -expr "10 + 29"
# to parse JSON string
$ go run tools/parsec/parsec.go -json '{ "key1" : [10, "hello", true, null, false] }'