This is a part of bachelor thesis: Parsing Based on Tree-Controled Grammars, Brno University of Technology 2016.
This CD contains:
CD
|- src - application source files
|- tcgp.py
\- ....
|- tests - tests
|- test.sh
\- ...
|- docs - thesis source files
|- thesis.tex
\- ...
|- thesis.pdf - thesis in pdf format
|- MIT_licence.txt
Basic usage is for verification, that text string belongs to specified grammar. This parser expands common LR parser, so it allows parsing of some non-context free grammars.
Source files are placed in ./src/
folder.
For execution use python3.
Usage:
python3 tcgp.py [-h] -g GRAMMAR [-p CHOICE [CHOICE ...]] [-i INPUT] [-o OUTPUT]
Tree controlled grammar parser
optional arguments:
-h, --help show this help message and exit
-g GRAMMAR, --grammar GRAMMAR
Input grammar file
-p CHOICE [CHOICE ...], --print CHOICE [CHOICE ...]
Decide what to print from these CHOICES:
- tree: final derivation tree
- trees: derivation tree development
- stack: continuously print stack of symbols
- rules: continuously print applied rules
- groups: lr groups generated from rules
- table: lr table
- eff: empty, first and follow sets
- automat: print final state machine
- precedence: print precedence table
- grammar: print input grammar
- scanner: print input scanner automat
- all: print all
-i INPUT, --input INPUT
Input string file, <stdin> if not present
-o OUTPUT, --output OUTPUT
Output file, <stdout> if not present
This file specifies tree controlled grammar.
Mandatory part is controlled grammar, which syntax is following. This part must be defined first:
grammar = (
{<id>, <id>, ..., <id>}, # nonterminals
{<str>, <str>, ..., <str>}, # terminals
{ # rules
<id> -> [<id>|<str>]* ;
<id> -> [<id>|<str>]* ;
...
<id> -> [<id>|<str>]* ;
},
<id> # start symbol
)
Other optional parts:
levels
- define control language by enumeration
levels = {
[<str>]* ; # listed symbols
[<str>]* ;
...
[<str>]* ;
}
automaton
- define control language by finite automaton
automaton = (
{<id>, <id>, ..., <id>}, # states
{ # rules
<id> <str> -> <id> ;
<id> <str> -> <id> ;
...
<id> <str> -> <id> ;
},
<id>, # start state
{<id>, <id>, ..., <id>} # final states
)
precedence
- define precedence rules for operators priority
precedence = (
<dir>: [<id>|<str>], [<id>|<str>], ..., [<id>|<str>] ;
<dir>: [<id>|<str>], [<id>|<str>], ..., [<id>|<str>] ;
...
<dir>: [<id>|<str>], [<id>|<str>], ..., [<id>|<str>] ;
)
Meaning of shortcuts in syntax of grammar file:
<id>
- c-like id<str>
- string bounded by simple quotes ('
), quote can be escaped by\
<dir>
- associativity direction in precedence table, values:left
- left associativityright
- right associativity
You can also write single line comments starting with #
symbol.
Input string is expected to contain grammar terminals.
White chars are ignored (used as separators).
Terminals don't have to be separated by white chars,
but there can be bad interpretation, if there are multiple
ways how to interpret string (e.g. two terminals a
and aa
).
Depending on input, application returns one of these exit codes:
- 0: Input string belongs to input grammar.
- 1:
NOT_IN_GRAMMAR
- Input string doesn't belong to input grammar. - 2:
NONDETERM_ERROR
- Nondeterministic step has been applied and then we ran into error. It is not clear, if string belongs to grammar. This problem is described closely in Bc. thesis. - 3:
GRAMMAR_PARSE_ERROR
- Syntax or logical error in input grammar file. - 4:
LR_TABLE_ERROR
- Conflict or other problem in LR table. - 5:
FINITE_AUTOMAT_ERROR
- Logical error in user defined Finite automat. - 10:
ARGUMENTS_ERROR
- Arguments error. - 99:
INTERNAL_ERROR
- Other internal error.
Test suite is placed in ./tests/
folder and contains many example grammars, including all grammars mentioned in the Bc. thesis.
You can run all tests by bash script by typing ./test.sh
in tests folder.
Source files of thesis are placed in ./docs/
folder. For compiling
to .pdf format just type make
in documentation folder. Package pdflatex
is required.
Content of this CD (including all source files) is protected by MIT licence.