UFSC/CTC/INE INE5421 - Formal Languages and Compilers (2015/2) Gustavo Zambonin (13104307) & Matheus Ben-Hur de Melo Leite (13100765) Practice I - Algorithms for Manipulation of Regular Languages The language chosen to code these algorithms was Python, for it is easier to manipulate object types dynamically. Almost all the information related to the program's workings may be found through its manpage, with the command man ./rltools where it is located (inside this folder). Invalid arguments will be rejected by the program when possible. A semi-automated test file was placed on the associated folder for these operations, making it easier to claim the consistency of the I/O operations. Powered by Bash scripting, it executes determinizations and conversions from and to finite automata, regular grammars and regular expressions consecutively using the same outputs it created earlier in a cyclic manner, asserting the files' contents regularity. sh autotester.sh <non-deterministic finite automaton input file> Finally, some design rules for an input automaton file are described below. Grammar and expression files' structure is alike. Examples for those can be generated using the above script. * epsilon-moves must be added only to the required states; * a state must consist of all the transitions through letters of the alphabet, even if those are empty. The regular expression parser accepts only unary letters alphabets. Hence, an expression of the form "(q0|q1*)" will be parsed with the set {'q', '0', '1'} as its symbols, and won't match a possible desired {'q0', 'q1'} set. Parenthesis should be used whenever possible when describing regular expressions. The code itself should be executed using Python 3.x. 02/10/2015 Original assignment: e65b8a1b03c7be85279dcd6236d3064d892b6ffd Practice II - Lexical Analysis The lexical structure [1], specifically built for this assignment, is presented below, with some remarks about its inner workings. It can be used as a guide to write possible "programs" with this proto-language, albeit syntactically and semantically incorrect. Remember that the input file must end with a new line, and separators are essential to the scanning of the tokens (1+1 != 1 + 1). keyword ::= 'else' | 'if' | 'while' | 'read' | 'write' | 'list' | 'bool' | 'str' | 'int' | boolean boolean ::= 'False' | 'True' identifier ::= letter (letter | digit | '_')* [2] letter ::= lowercase | uppercase lowercase ::= 'a' | 'b' | ... | 'z' uppercase ::= 'A' | 'B' | ... | 'Z' digit ::= '0' | '1' | ... | '9' nonzerodigit ::= '1' | ... | '9' char ::= any ASCII character between 33 and 126, with the exception of 34, 40, 41, 42, 92 and 124 [3] string ::= '"'char*'"' integer ::= nonzerodigit digit* | '0' arit_op ::= '+' | '-' | '×' | '/' logic_op ::= 'and' | 'or' | 'not' comp_op ::= '<' | '>' | '==' | '>=' | '<=' | '!=' attr_op ::= '=' | '->' | ':=' [1] Loosely based on: https://inst.eecs.berkeley.edu/~cs164/fa11/python-grammar.html [2] It must not match a keyword. [3] More specifically, the ASCII indexes noted are in base 10. The list starts at ! and ends with ~, excluding ", (, ), *, \ and |. Note that a string must be enclosed within double quotes. More information about this character set can be found on its manpage using the command man ascii on a Unix-based OS. 08/11/2015 Original assignment: 155d45589027987b6e3b840f9799918eda49f1f5 Practice III - Parsing An LL(1) parser was created using a grammar constructed manually through first/follow sets. The documents describing that process are stored inside the `docs/` folder. It derives its productions according to the resulting parsing table, programmed as a Python executable. 30/11/2015 Original assignment: 22d84ed0d365fff847148db1256eb4d1b5afbd38