/epeg

Epeg is a library to read PEGs from textfiles and apply them to input to create abstract syntax trees.

Primary LanguageLuaGNU General Public License v3.0GPL-3.0

Introduction
============

Epeg is a library to read PEGs from textfiles and apply them to input to create
abstract syntax trees.

Features:
- The complete PEG can be stored in a single plain textfile.
- You define in the PEG itself how the AST should look and what input you
  want to store in the nodes.
- You can print parsing errors for malformed input.
- You can print warnings for input which is valid, but deprecated/
  dangerous/... .
- You can call functions to implement actions or extended tests.
- You can dump the grammar into a lua-table instead of reading the PEG
  everytime from textfile, which enhances performance.
- The AST stores for every node at which position/row/column in the inputstream
  it got created.
- Classes use lua syntax for patternmatching (for example [^%l] or [%L])
- Lua escapesequences get converted into their binary counterparts while
  reading the PEG (only single characters).
- Checks for possible left infinite recursion while parsing the PEG.


Note: If you are only interested in running actions while parsing input, you
should consider lpeg instead, because it is much better for this kind of work.


Usage
=====

- store the PEG in a textfile
- use epeg = dofile("epeg.lua") to include the library (not a fancy lua module)
- use epeg.read_grammar("example.epeg") to get the grammar
- use epeg.parse_text(grammar, text) to parse some text and get an AST
- use epeg.dump_grammar(grammar, name, filename) to dump the grammar as
  lua table and retrieve it later for better runtime performance, instead of
  reading the PEG from textfile everytime.


Files
=====

epeg.lua        - the library
epeg.epeg       - the PEG for PEG itself (used for development)
epegscanner.lua - use it to dump the grammar from epeg.epeg
                  (result stored in r1.lua)
TODO            - tasks planned/possible
README          - this file
COPYING         - copyright notices


Handling Errors and Warnings
============================

If you want to handle errors and warnings, just append
'error ("message")' or 'warning ("message")' aftern a pattern. You can't
append both (yet.):

FunctionName <- ([a-z][A-Z]*) error ("Expected at least one letter.")
Pragma       <- 'pragma' [%s] warning ("The use of #pragma is deprecated.")

If a pattern with an error attribute fails, the parser aborts and prints the
error message.

You can't append both error/warning to a node.


Grabbing Nodes and Input
========================

Nodes are represented by tables in lua. The following indexes are used for
every node and cannot be used as nodenames:

position/
row/
column: stores the position where the node in the input stream
        got created.
parent: a backlink to the parent node or nil if no parentnode exists.
pindex: index of the node in it's parent.
        Thus node.parent[node.pindex] == node.
string: grabbed input.


'< Pattern >' grabs the matched pattern and appends the result in the actual
node to node.string (initialized with ""). If Pattern fails, nothing gets
stored. You can nest this construct.

'<NodeName: Pattern >' creates a new node with the name NodeName as a childnode
of the actual node. If a childnode with the same name already exists, the
operation fails and the parser stops with an errormessage. For Pattern the
actual node becomes the newly created node.

'<NodeName::"String">' is similar to '<:NodeName: Pattern>', but stores a
fixed String as content of the node in node.string.

'<: Pattern >' creates a node with no name. The name is set to the lowest free
positive integer in the parentnode starting with 1. This is useful to create
lists of nodes.

'<::"String">' is similar to '<: Pattern>' but creates a node with a fixed
string as content.


Calling Functions
=================

You can call functions from the PEG to handle certain situations. The function
gets as first argument the parser with the actual position (position/row/column)
and as second argument the actual node.

To add a functioncall use {: FunctionName }. If you want a function to check
for a condition, add a &/! in front of it:

&{: check_condition }


Standard PEG constructs
=======================

(You should read peg-popl04 for a thorough understanding)

Examples:

rule declaration: Rulename <- ...
class:            [a-zA-Z_.]
choice:           "hello" / "goodbye"
subpatterns:      (!r1 r2* / r3)+ / (r4)* r5


Library
=======

You can use the following functions in the library:

epeg.read_grammar(filename)

Reads a PEG from file filename and returns the grammar. Returns false and
prints error messages to stdout if fails.

epeg.parse_text(grammar, text)

Parses the text using grammar. Grammar has to be a valid grammar created by
epeg.read_grammar or stored by epeg.dump_grammar. Returns the AST if successful,
else false. Prints all messages to stdout.

epeg.dump_grammar(grammar, name, filename)

Dumps the grammar as lua-table in file filename with the name name. if name
is not a string, it will use 'return {...' instead of 'name = {...'. If
filename is not given the output will go to stdout. Prints all messages to
stdout. If you want to use the dumped grammar, just use the table as the
grammar argument with parse_text next time.


All other functions are either for debugging purposes or get called by these 3
functions.