This repository contains code and resources for various phases of a C compiler. Each phase is implemented separately to understand the workings of a compiler better. More details regarding code in the report provided in the repo.
The lexical analyzer is the first phase of the compiler, responsible for converting the source code into a sequence of tokens. This phase removes comments, white spaces, and identifies keywords, identifiers, constants, operators, and punctuation symbols
To use the lexical analyzer, you can compile the provided code (lexAnalyzer.c
) using a Lex tool ;
lex lexAnalyzer.l
cc lex.yy.c
./a.out < testCases/ifelse.c
The Symbol Table will be generated in symbolTable.txt file. There is also a lex file to remove comments from a given .c file
lex commentRemover.l
cc lex.yy.c
./a.out < TestCases/ifelse.c
A new file called output.c will be created.
In this phase syntax analyzers receive their inputs, in the form of tokens, from lexical analyzers.Checks the expressions from this token are syntactically correct and create a parse tree
lex parseTree.l
yacc -d parseTree.y
cc lex.yy.c y.tab.c
./a.out < TestCases/forloop.c
Parse tree will be printed in the terminal with its preorder traversal.
In this phase, we extract necessary semantic information from the source code which is impossible to detect in parsing.
lex scanner.l
yacc -d parser.y
cc lex.yy.c y.tab.c
./a.out < TestCases/forloop.c
Parsing result will be printed in the terminal.
We are trying to generate language independent three-address code for a given source program which is lexically,syntactically and semantically correct.
lex ICG.l
yacc -d ICG.y
cc lex.yy.c y.tab.c
./a.out < TestCases/forloop.c
Output will be in ICG.txt file
In this phase,the machine-independent code-optimization phase attempts to improve the intermediate code so that better target code will result. Usually better means faster, but other objectives may be desired, such as shorter code, or target code that consumes less power.
Paste ICG generated in previous phase in input.txt
python optimizer.py input.txt
Output will be in optimized_icg.txt
The final phase in our compiler model is the code generator. It takes as input the intermediate representation (IR) produced by the front end of the compiler, along with relevant symbol table information, and produces semantically equivalent target programs.
Paste ICG generated in previous phase in icg.txt
python assembly.py icg.txt
Output will be in icg.s