/tiger

Compiler for the tiger programming language

Primary LanguageStandard ML

Compilers - CS4110

  • Gary Soeller
  • Nathan Lilienthal
  • James McNamara

Writeups are included for each section of the compiler:

Setup

Setting up things on OS X is easy thanks to Homebrew. Simple run the following command:

$ brew install smlnj rlwrap

Usage

We'll make use of the CM module for building the projects. To use it you need a sources.cm file. Then run the following commands.

$ rlwrap sml sources.cm tests.sml

Lexer

We handled comments by creating a structure which acts as a stack containing line positions where comments were opened. As the lexer sees a /* string in the initial state, it adds the line position to the stack. When it sees a */ string in the comment state, it pops a position from the stack. Using a stack in this way, we are able to tell if all of the comments are eventually closed or not.

Our error handling for strings and comments was done in the EOF function. In a similar way to how we had a structure to handle comments, we also had one to handle strings. In the EOF function, we checked the state of both of these structures to ensure comments and strings were fully closed. If one of them is not, we print an error and throw an appropriate exception. If all comments and strings are closed, we simply return the EOF token.

One aspect of our code which we think will be of great importance as we progress through the compiler is how we tested our code. We developed our own test structure which allows us to run tests in a repeatable manner. Our testing code includes statistics on how many tests pass or fail as well as verbose output when a test fails.

Another interesting aspect of our compiler is that we use a lookup table to manage ids and keywords. Using this lookup table, we are able to easily map strings to reserved keywords and seen ids.

Parser

We encountered a few shift/reduce conflicts with our grammar. The first two were due to an ambiguity with our type and function declarations. The parser was unsure whether or not it should read(shift) a new declaration or reduce after each one. The desired action was to greedily read all of the declarations and then reduce. By adding a precedence to each declaration list, we achieved our desired outcome. Although we fixed one conflict, another arose regarding left brackets and ids. This was easily fixed as we want our parser to shift when it sees a left bracket after an id rather than reduce.

The next major s/r conflict was the dangling else problem. If we read the statement if 1 then if 2 then 3 else 4, we want it to be equivalent to (if 1 then (if 2 then 3 else 4)). This was solved by setting the precedence of else to be higher than if making it bind to the nearest if. For descriptions set of each precedence levels see parser/tiger.grm.

It's worth noting that things like a := b := c will parse into an AST, which is technically correct, in the grammar. However, the semantics of a valueless expression will deny this kind of expression at a later phase of the compiler.