/peg

Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

Primary LanguageGoBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

About

Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator. A Packrat parser is a descent recursive parser capable of backtracking. The generated parser searches for the correct parsing of the input.

For more information see:

This Go implementation is based on:

Usage

-inline
 Tells the parser generator to inline parser rules.
-switch
 Reduces the number of rules that have to be tried for some pegs.
 If statements are replaced with switch statements.

Syntax

First declare the package name:

package <package name>

Then declare the parser:

type <parser name> Peg {
	<parser state variables>
}

Next declare the rules. The first rule is the entry point into the parser:

<rule name> <- <rule body>

The first rule should probably end with '!.' to indicate no more input follows:

first <- . !.

'.' means any character matches. For zero or more character matches use:

repetition <- .*

For one or more character matches use:

oneOrMore <- .+

For an optional character match use:

optional <- .?

If specific charaters are to be matched use single quotes:

specific <- 'a'* 'bc'+ 'de'?

will match the string "aaabcbcde".

For choosing between differnt inputs use alternates:

prioritized <- 'a' 'a'* / 'bc'+ / 'de'?

will match "aaaa" or "bcbc" or "de" or "". The matches are attempted in order.

If the characters are case insensitive use double quotes:

insensitive <- "abc"

will match "abc" or "Abc" or "ABc" etc...

For matching a set of characters use a character class:

class <- [a-z]

will watch "a" or "b" or all the way to "z".

For an inverse character class start with a tilde:

inverse <- [~a-z]

will match anything but "a" or "b" or all the way to "z"

If the character class is case insensitive use double brackets:

insensitive <- [[A-Z]]

Use parentheses for grouping:

grouping <- (rule1 / rule2) rule3

For looking ahead for a match (predicate) use:

lookAhead <- &rule1 rule2

For inverse look ahead use:

inverse <- !rule1 rule2

Use curly braces for Go code:

gocode <- { fmt.Println("hello world") }

For string captures use less than greater than:

capture <- <'capture'> { fmt.Println(buffer[begin:end]) }

Will print out "capture". The captured string is stored in buffer[begin:end].

Files

  • bootstrap/main.go: bootstrap syntax tree of peg
  • peg.go: syntax tree and code generator
  • main.go: bootstrap main
  • peg.peg: peg in its own language

Testing

There should be no differences between the bootstrap and self compiled:

./peg -inline -switch peg.peg
diff bootstrap.peg.go peg.peg.go

Author

Andrew Snodgrass