ScanRat2 is a mashup of the IronMeta PackRat parsing algorithm, and the concepts of the FParsec and Sprache projects.
Note: ScanRat2 is a fork of ScanRat https://github.com/pragmatrix/ScanRat, with few minor modifications. The code slightly modernized and now is Fable compatible.
- Automatic memoization of the parsing results.
- Direct and mutual left recursion as specified in the paper Left Recursion in Parsing Expression Grammars.
- Computation Expressions to conventiently parse sequences (inspired by Sprache's LinQ SelectMany "hack").
and what's missing:
- RegExp support
- Source indices must be accessible inside the result generators (define an additional production operator?).
To use ScanRat2 in Visual Studio, install the NuGet package or clone this repository and add the ScanRat/ScanRat.fsproj
as a reference to your project.
In your F# source file, add
open ScanRat.ScanRat
and start writing grammars. A grammar is specified as a collection of production rules. The rules are build from a number of combinators.
The generic combinators listed here can be applied to any parsing rule of any type.
-
+
is the sequence combinatorlet twoDigits = digit + digit
is a production rule that parses two digits, not one, not zero not three.
Sequence combinators are left associative, which means that they are combined from left to right. The parsing result type of the
+
sequence combinator is a tuple that contains the parsing result of the left rule and the parsing result of the right production rule.The
+.
and.+
sequence combinators can be used to only process the result of the rule that is placed at the side of the dot. -
|-
is the choice combinatorlet eitherLeftOrRight = left |- right
Parses either left or right. If both rules match the input, the first rule is preferred. Both rules must be of the same result type.
The choice combinator is also left associative, but has a lower priority than the sequence combinators, which means that you can put sequences inside the choice rules without using parenthesis:
let driverDecision = accellerate + overtake |- driveSlowly
-
-->
is the production combinator-->
is used to capture and convert the parsing result. It expects a function that takes the parsing result from the rule on the left side and converts its result.For example, when parsing a two digit number, and
digit
itself is a rule that returns an integer, the resulting number can be computed on the spot:let twoDigitNumber = digit + digit --> fun (digit1, digit2) -> digit1 * 10 + digit2
The string specific combinators are optimized to handle string based input.
-
~~
Parses a simple string. This is an unary combinator that is placed in front of a string to convert it to a parsing rule. The rule's return type is a string.let hello = ~~"Hello"
defines a rule that parses the string "Hello".
-
oneOf
Parses one character of a string. The rule's return type is a character.let oneOrTwo = oneOf "12"
is a shorthand and optimized form of
let oneOrTwo = (~~"1" |- ~~"2") --> fun str -> str.[0]
To conveniently parse a digit and return the integer value of it,
oneOf
can be used like:let digit = oneOf "0123456789" --> fun c -> int(c) - int('0')
Parsing is done by calling the parse
function. Two arguments are required, the first one is the grammar and the second one is the input (a string for now):
let digit = oneOf "0123456789" --> fun c -> int(c) - int('0')
let r = parse digit "3"
The result of a parse is either Success
or Failure
:
match r with
| Success s -> s.Value
| Failure f -> failwith "error"
Because a rule may need to be accessed before the point it has been defined, recursive rules are specified in a slightly different way:
let digit = oneOf "0123456789" --> fun d -> int(d) - int('0')
let digits = production "digits"
digits.rule
<- digits + digit --> fun (a, b) -> a*10+b
|- digit
Here the production
function creates an initially named but empty production rule. After that, the rule's term can then be assigned to production's rule
property. This makes it possible for the term to refer back to the production and - like in this example - specify a left recursive grammar that parses digits.
Combining more than three items with the +
sequence combinator may get a bit annoying, because each new item generates a new tuple that contains the previous result type as its first argument.
For longer sequences, there is an alternative which makes use of Computation Expressions:
The rule:
let addressRule = nameGrammar + streetGrammar + cityGrammar + phoneGrammar --> fun (((name, street), city), phone) -> { Name = name; Street = street; City = city; Phone = phone }
may also be specified by a much more readable and extensible:
parseq {
let! name = nameGrammar
let! street = streetGrammar
let! city = cityGrammar
let! phone = phoneGrammar
return { Name = name; Street = street; City = city; Phone = phone }
}
TBD
- If a direct left and right recursion is used in one rule, the algorithm incorrectly right-associates the parses as noted by Laurence Tratt in his paper.
- The parsers that are built inside a computation expression can not be memoized, but the parsers they refer to, can. Therefore it's more efficient to refer to parsers from inside computation expressions instead of putting them inline.
Many thanks go to
-
Gordon Tishler, the author of the IronMeta project, who implemented the core matching algorithm.
-
Nicolaus Blumhardt for the inspiring Sprache parser combinators I use in a lot of C# projects.