/rltools

regular language tools - automata-based tokenizer, LL(1) parser

Primary LanguagePython

UFSC/CTC/INE
INE5421 - Formal Languages and Compilers (2015/2)
Gustavo Zambonin (13104307) & Matheus Ben-Hur de Melo Leite (13100765)

Practice I - Algorithms for Manipulation of Regular Languages

The language chosen to code these algorithms was Python, for it is easier to
manipulate object types dynamically. Almost all the information related to the
program's workings may be found through its manpage, with the command

    man ./rltools

where it is located (inside this folder). Invalid arguments will be rejected by
the program when possible.

A semi-automated test file was placed on the associated folder for these
operations, making it easier to claim the consistency of the I/O operations.
Powered by Bash scripting, it executes determinizations and conversions from
and to finite automata, regular grammars and regular expressions consecutively
using the same outputs it created earlier in a cyclic manner, asserting the
files' contents regularity.

    sh autotester.sh <non-deterministic finite automaton input file>

Finally, some design rules for an input automaton file are described below.
Grammar and expression files' structure is alike. Examples for those can be
generated using the above script.

* epsilon-moves must be added only to the required states;
* a state must consist of all the transitions through letters of the alphabet,
even if those are empty.

The regular expression parser accepts only unary letters alphabets. Hence, an
expression of the form "(q0|q1*)" will be parsed with the set {'q', '0', '1'}
as its symbols, and won't match a possible desired {'q0', 'q1'} set.
Parenthesis should be used whenever possible when describing regular
expressions. The code itself should be executed using Python 3.x.

                                                                    02/10/2015

Original assignment: e65b8a1b03c7be85279dcd6236d3064d892b6ffd

Practice II - Lexical Analysis

The lexical structure [1], specifically built for this assignment, is presented
below, with some remarks about its inner workings. It can be used as a guide to
write possible "programs" with this proto-language, albeit syntactically and
semantically incorrect. Remember that the input file must end with a new line,
and separators are essential to the scanning of the tokens (1+1 != 1 + 1).

    keyword      ::= 'else' | 'if' | 'while' | 'read' | 'write' | 'list' |
                     'bool' | 'str' | 'int' | boolean
    boolean      ::= 'False' | 'True'
    identifier   ::= letter (letter | digit | '_')* [2]
    letter       ::= lowercase | uppercase
    lowercase    ::= 'a' | 'b' | ... | 'z'
    uppercase    ::= 'A' | 'B' | ... | 'Z'
    digit        ::= '0' | '1' | ... | '9'
    nonzerodigit ::= '1' | ... | '9'
    char         ::= any ASCII character between 33 and 126, with the
                     exception of 34, 40, 41, 42, 92 and 124 [3]
    string       ::= '"'char*'"'
    integer      ::= nonzerodigit digit* | '0'
    arit_op      ::= '+' | '-' | '×' | '/'
    logic_op     ::= 'and' | 'or' | 'not'
    comp_op      ::= '<' | '>' | '==' | '>=' | '<=' | '!='
    attr_op      ::= '=' | '->' | ':='

[1] Loosely based on:
https://inst.eecs.berkeley.edu/~cs164/fa11/python-grammar.html

[2] It must not match a keyword.

[3] More specifically, the ASCII indexes noted are in base 10. The list
starts at ! and ends with ~, excluding ", (, ), *, \ and |. Note that a string
must be enclosed within double quotes. More information about this character
set can be found on its manpage using the command

    man ascii

on a Unix-based OS.

                                                                    08/11/2015

Original assignment: 155d45589027987b6e3b840f9799918eda49f1f5

Practice III - Parsing

An LL(1) parser was created using a grammar constructed manually through
first/follow sets. The documents describing that process are stored inside
the `docs/` folder. It derives its productions according to the resulting
parsing table, programmed as a Python executable.

                                                                    30/11/2015

Original assignment: 22d84ed0d365fff847148db1256eb4d1b5afbd38