/rupac

A monadic parser combinator library for Ruby

Primary LanguageRubyMIT LicenseMIT

GitHub release Build Status Coverage Status

Rupac

A monadic parser combinator library for Ruby capable of parsing complex recursive grammars.

# A simple recursive grammar for tuples
grammar = Grammar.new do
  rule(:value) { quoted_string | number | rule(:tuple) }
  rule(:tuple) { match('(') >> rule(:value) >> match(',') >> rule(:value) >> match(')') }

  rule(:tuple)
end

puts grammar.parse("(1,(2,3))") # => pass
puts grammar.parse("((1,2)")    # => fail

Installation

TODO

Why?

Monadic parser combinators allow us to generate lexers (something that tokenizes input) and parsers (something that transforms input into something meaningful) by describing grammars in a way that is structurally similar to the BNF form of the grammars themselves. Practically, this means we can write a JSON parser in only a few lines of code that is easy to understand and maintain.

Usage

A Grammar defines something you want to parse. Grammars are constructed by creating one or more rules. The "root" rule for the grammar is returned as the last item in the block. The root rule should contain all the other rules.

require 'rupac'
grammar = Grammar.new do
  rule(:the_number_one) { match('1') }
  rule(:the_number_one) # root rule
end

Rules are made from parsers:

match('abc')  # matches the string "abc"
number        # matches numbers like "1234"
whitespace    # matches whitespace
...

Parsers can be combined using combinators to form new parsers. By combining the base parsers you can create the building blocks for parsing more complex strings.

match('hello') >> match('!')   # matches "hello" and then "!"
match('good') > match('night') # optionally matches "good" and always matches "night"
many1(alphanumeric)            # matches a string of alphanumeric characters
...

Once a grammar has been specified, additional processing rules can be defined to transform the input. For example, you may wish to convert a string into a number or a CSV row into a Ruby array.

grammar.process_rules do
  process(:the_number_one) { |v| v.to_i }
end

Parsing is done via the parse method of the grammar. Parsing returns an algebraic data type, ParserResult , that represents either a success or failure. If successful, it will contain the result (what was parsed) and the residual (the remainder of the input). If unsuccessful, it will return the input as the residual.

r = grammar.parse('123')
puts r.passed?  # => true
puts r.result   # => 1
puts r.residual # => 23

r = grammar.parse('321')
puts r.failed?  # => true
puts r.residual # => 321

Parsers

  • alphanumeric: parses an alphanumeric character.
  • any_except(string): parses any character except the characters in the provided string.
  • any(string): parses any character in the provided string.
  • digit: parses a digit.
  • eof: excepts only whitespace in the remaining input.
  • fail: always fails and returns the input as the residual.
  • match(string): matches the provided string.
  • number: parses a number.
  • quoted_string: parses anything between "".
  • success: always succeeds.
  • whitespace: consumes all whitespace.
  • whitespace(string): consumes all whitespace around the provided string.

Combinators

  • parser1 | parser2: "chooses" parser1 if it succeeds, otherwise, parser2.
  • lookahead(parser): runs the provided parser without consuming input.
  • many0(parser): attempts to run the parser zero or more times.
  • many1(parser): expects one or more successful parser runs.
  • parser1 > parser2: optionally tries parser1 and always runs parser2.
  • parser1 < parser2: always runs parser1 and optionally tries parser2.
  • parser1 >> parser2: runs parser1 and then parser2.
  • skip_to(parser): keeps consuming input until parser succeeds.

Utility

  • parser.map { }: transforms parser results on success.
  • parser.flat_map { }: transforms parser results on success. Monadic bind.
  • parser.pure(value): lifts a value into the parser. The parser will always return this value.
  • parser.apply(value): applies a value to the function contained by the parser.
  • parser.name(string): returns a new parser that will use the provided name for debugging and printing.
  • parser.error(string): returns a new parser that will use the provided error for debugging.

Examples

See /examples:

Development

  • Setup: bin/setup
  • Console: bin/console

Tests

  • rake spec

License

The gem is available as open source under the terms of the MIT License.