/Tiger-Compiler-SML

Primary LanguageStandard MLMIT LicenseMIT

Compiler-SML An implementation of a compiler for the Tiger Language

Compiler Construction

ECE/CS 553

Duke University

Part.1--Lexical Analysis

Due Data: 01/27/2014

Team Members: Kuang Han, Dan Deng, Ang Li

In this part of the compiler, we implemented features for a working compiler that contains fully functional rules for translating an input string into a sequences of tokens associated with the specific programming language.

Main techniques used in this analysis include:

-deterministic finite-state automata

-regular expressions

We defined four different states for handling syntax and conventions in the Tiger language: INITIAL, COMMENT, STRING, and NPSTRING.

INITIAL state:

-the start state of any token sequence

-when special characters, symbols, integer numbers, and reserved words were encountered, a token of the matched type is produced based on the longest match rule

-accept "/*" to transition into a COMMENT state, and \" to transition into a STRING state

COMMENT state:

-discard any character encountered during COMMENT state except for the transition rules

-when "/*" is encountered during COMMENT state, a nested comment is detected, and we increase the comment count by 1

-when "*/" is encountered, decrease the comment count by 1, and transition to INITIAL if count is 0

STRING state:

-append every non-special character to the string builder when in STRING state

-when [\\] is encountered, the program transitions into NPSTRING state, whereby it is ready to take legal escape characters defined by the Tiger language

-accept \" to end the STRING state and transition into INITIAL state, producing a string token at the same time

NPSTRING state:

-accept zero or more number of [\n\ \t\f] as the escape sequence

-transition back to STRING state when another [\\] is encountered

-any other character is illegal in this case

Global variables defined:

-lineNum, an int to keep track of the line count for accurate error reporting

-linePos, an int to keep track of the character count in a line for accurate error reporting

-strBuilder, a string to build up a string during STRING state

-strPosition, an int to keep track of the character count of the start of a string

-uncloseStr, a Boolean to report any unclosed string

-cmCount, an int to keep a comment count for handling nested comments

Other modifications:

-we modified driver.sml to include some steps for better error handling: our error report contains the specific file name for the file that causes the error, and we reset the lineNum and linePos in ErrorMsg before every run of parsing.

-we modified the eof() function to report any unclosed string or unclosed comment, and if everything is right, return and print out the EOF token with a position count

Known bugs or problems:

-no bugs

-if you enter \ddd in a string literal with an illegal ASCII code in decimal, the lexer throws an uncaught exception