/subc-compiler

Repository for subset c compiler written in Lex and Yacc as part of Language Processors course at NITW

Primary LanguageYacc

#Documentation for SubsetC compiler

###Requirements: These requirements are for users running Debian based Operating Systems like Ubuntu/Fedora

One needs Flex(lex) and Bison(upward compatible with yacc) for lexical analyzer generator and parser generator

To install flex and yacc in Ubuntu:

  • sudo apt-get install flex
  • sudo apt-get install bison

To install flex and yacc in Fedora:

  • sudo yum install flex flex-devel
  • sudo yum install bison

Steps to execute the project:

  • lex subc-compiler.l
  • yacc subc-compiler.y
  • gcc y.tab.c -ll -ly
  • ./a.out test

or simply use shellscript to automate , run

  • chmod 777 build.sh (this is for making the script executable) execute by running

  • ./build.sh The output will be displayed and written into file named output Subsetc Compiler Lex File:

  • alpha [a-zA-Z]

  • digit [0-9]

Alpha and digit are definitions for alphabets and digits.

  • [\n]

for newline and increase count for line number

  • {digit}+

for identifying a number

  • {alpha}({alpha}|{digit})*

for identifying an identifier

  • //.* ;
  • /*(.\n).**/ ;

Regular Expression for finding comments

Rest of the file is self-explanatory.

##Yacc File:

  • Header files and some definitions/declarations are included in between %{...%} braces.
  • FILE* f1 is used for writing the intermediate code to a temporary file.
  • The code is finally printed from the file to the console after entire parsing is done.
  • Tokens returned from the lex file are declared in the initial section.

####Operator Precedence:

%right ASGN

%left LOR

%left LAND

%left BOR

%left BXOR

%left BAND

%left EQ NE

%left LE GE LT GT

%left '+' '-'

%left '*' '/' '@'

  • %left/%right are used for setting the associativity of various operators.
  • %right is used for "=" as it is a right associative operator.
  • %left is used for all remaining operators as all of them are left associative.
  • The ordering of operators even decides the operator precedence.
  • The operators having higher precedence are placed below the ones' with lesser precedence.
  • The operators having same precedence are present at the same level.

###Rules section:

  • The rules of the grammar are declared in the middle section of the yacc file.

pgmstart : TYPE ID '(' ')' STMTS ;

  • It's used for declaring return type of the function & then declaring a main function which is followed by a set of statements in a block.

STMTS : '{' STMT1 '}'| ; STMT1 : STMT STMT1 | ;

STMT : STMT_DECLARE //all types of statements | STMT_ASSGN
| STMT_IF | STMT_WHILE | STMT_SWITCH | ';' ;

  • These rules are used for declaring statements.
  • Each statement can further be either a Declaration stmt or Assignment stmt or If- Else stmt or While stmt or a switch stmt.

EXP : EXP LT{push();} EXP {codegen_logical();} | EXP LE{push();} EXP {codegen_logical();} | EXP GT{push();} EXP {codegen_logical();} | EXP GE{push();} EXP {codegen_logical();} | EXP NE{push();} EXP {codegen_logical();} | EXP EQ{push();} EXP {codegen_logical();} | EXP '+'{push();} EXP {codegen_algebric();} | EXP '-'{push();} EXP {codegen_algebric();} | EXP '*'{push();} EXP {codegen_algebric();} | EXP '/'{push();} EXP {codegen_algebric();} | EXP {push();} LOR EXP {codegen_logical();} | EXP {push();} LAND EXP {codegen_logical();} | EXP {push();} BOR EXP {codegen_logical();} | EXP {push();} BXOR EXP {codegen_logical();} | EXP {push();} BAND EXP {codegen_logical();} | '(' EXP ')' | ID {check();push();} | NUM {push();} ;

  • push function pushes the token discovered just before the push function, on to the stack.

  • The functions codegen_..... are executed after dicovering of an expression.

  • Though all the operators are kept under same rule, the order of declaration in the top declaration section gives precedence priority for all the operators of the grammar.

  • Similarly, depending on the structure of the other language constructs, remaining rules are defined and the corresponding functions are executed after parsing of that corresponding rule on the stack.

Code Section:

char st[1000][10]; int top=0; int i=0; char temp[2]="t";

int label[200]; int lnum=0; int ltop=0; int switch_stack[1000]; int stop=0; char type[10]; struct Table { char id[20]; char type[10]; }table[10000];

  • st stack is used for storing the node values.
  • label is used for storing the labels.
  • table is used for storing variables on to symbol table.
  • remng are helper variables for storing their attributes like top position etc.

functions:

  • push(): used for pushing node value on to the stack.
  • codegen_logical(): used for generating three address code for logical statements.
  • codegen_algebraic(): used for generating three address code for algebraic statements.
  • if_label(): for generating a label for the end of the if/if-else statement.
  • while_start(): used for generating the starting label for the while statement.
  • while_rep(): used for generating the if condition for the while statement.
  • while_end(): used for generating the code for moving back to start of the while statement.
  • check(): is used to check if the variable is already declared when referencing it.
  • STMT_DECLARE(): is used to check if the variable being declared is already declared previously.
  • Intermediatecode(): used for printing intermediate code once parsing is complete...

This documentation is signed off by Kranthi Kiran Guduru and Sri Krishna Shanmukh, B.Tech 3/4, Computer Science and Engineering, NIT Warangal.