Beware!

All the important features of this project (and many useful others) are now supported by Marpa::R2's Scanless interface (SLIF). The rationale and todos are completely outdated/irrelevant; the code, however, would well work still due to Marpa::R2's excellent backward compatibility.

So this repo is left here for purely illustrative/archive purposes. What exactly it illustrates is up to the reader. :) The author explicitly declares this code public domain for all intents and purposes.

MarpaX-Parse, MarpaX::Tool-to-be

Parts of this module will be refactored out into individual modules and probably distros as MarpaX::Tool::*, if they prove to be useful enough.

What It Is

This module aims at serving as a simple and powerful parsing interface to Marpa::R2 so that a user can:

  • set the 'rules' argument of Marpa::R2::Grammar to a string containing a BNF or EBNF grammar (which may embed %{ ... %} actions),
  • call parse method on the input and receive the value produced by Marpa::R2 evaluator based on emdedded %{ ... %} actions in textual grammar or closures (sub { ... }) rather than semantic action names) set in Marpa::R2 rules
  • have literals extracted from the textual grammar or Marpa::R2 rules and set up as regexes for lexer rules to tokenize the input for the recognizer,
  • set default_action to 'tree', 'xml', 'sexpr', 'AoA', and 'HoA', to have parse return a parse tree (Tree::Simple, XML string, S-expression string, array of arrays, and hash of arrays, accordingly),
  • call show_parse_tree($format) to view the parse tree as text dump, HTML or formatted XML;
  • use Tree::Simple::traverse, Tree::Simple::Visitor or XML::Twig to traverse the relevant parse trees and gain results.

Input can be a string or a reference to an array of tokens ([ $type, $value ] refs).

Ambiguous tokens can be defined by setting the input array item(s) to [ [ $type1, $value ], [ $type2, $value ] ] ... and will be handled with alternate()/earleme_complete() input model.

Feature => Test(s)

Marpa::R2::Grammar rules transforms to handle quantified (?|*|+) symbols

Extraction of closures and lexer regexes from Marpa::R2::Grammar rules

An example from the Parse::RecDescent tutorial, done the Marpa way

A BNF grammar with actions that can parse a possible signed decimal number

A BNF grammar that can parse a BNF grammar that can parse a decimal number

An example from the Parse::RecDescent tutorial done in textual BNF with embedded actions

Parse trees generation and traversal

Comparison of parse tree evaluation

Parsing 'time flies like an arrow, bit fruit flies like a banana' sentence getting part of speech data from WordNet::QueryData (if installed) or pre-set hash ref (otherwise)

Pre-requisites:

Core (closures/lexer regexes extraction, quantified symbols, textual BNF with embedded actions, see test cases 02-07, 08 for details)

Marpa::R2
Clone
Eval::Closure
Math::Combinatorics

Parse Trees (set default_action to 'xml', 'tree', 'sexpr' or 'AoA' to have XML string, Tree::Simple, S-expression or array of arrays parse trees accordingly; use show_parse_tree("text" or "html") to view Tree::Simple parse trees as text or html, see test cases 10, 11 and 13 for details))

Data::TreeDumper
Tree::Simple
	Tree::Simple::Visitor
	Tree::Simple::View
XML::Twig