- Used hash of hash table to store all the data
- The Scope table keeps track of the current and parent scopes
- Problem Specifications
-
Takes stream of
lexemes
and produces a stream oftokens
-
Reports lexical errors with corresponding line number and error message
- Unfinished multi line comment
/*this is a comment
thenEOF
is found - Too many decimal points
2.32.3
- Ill formed number
1E10.7
- Empty character
''
- Multi character constant
'abd'
- Unfinished character
'a
- Unfinished String
"Unfinished String
- Invalid prefix on ID or invalid suffix on Number
12ABc
- Any Unrecognized character
- Unfinished multi line comment
-
The IO folder contains some sample input output
- Need to install
flex
on yourlinux
machine orWSL
flex -o sample.c 1805082.l
g++ sample.c -lfl -o sample.o
./sample.o
- Used
bison
andflex
- Problem Specifications
- Grammar
- There can be multiple functions. No two functions will have the same name. A function needs to be defined or declared before it is called. Also, a function and a global variable cannot have the same symbol.
- There will be no pre-processing directives like
#include
or#define
. - Variables can be declared at suitable places inside a function. Variables can also be declared in the global scope.
- Precedence and associativity rules are as per standard. Although we will ignore consecutive logical operators or consecutive
relational operators like,
a && b && c
,a < b < c
. - No
break
statement andswitch-case
statement will be used. println(n)
is used instead ofprintf(“%d\n”, n)
to simplify the analysis, wheren
is a declared variable.
Some common syntax errors are handled and recovered so that the parser does not stop parsing immediately after recongnizing an error.
Type Checking
- Generates error message if operands of an assignment operator are not consistent with each other. The second operand of the assignment operator will be an expression that may contain numbers, variables, function calls, etc.
- Generates an error message if the index of an array is not an integer.
- Both the operands of the modulus operator should be integers. During a function call all the arguments should be consistent with the function definition.
- A void function cannot be called as a part of an expression.
Type Conversion
Conversion from float to integer in any expression generates an error. Also, the result of RELOP and LOGICOP operations are integers.Uniqueness Checking
Checks whether a variable used in an expression is declared or not. Also, checks whether there are multiple declarations of variables with the same ID in the same scope.Array Index
Checks whether there is an index used with array and vice versa.Function Parameters
Check whether a function is called with appropriate number of parameters with appropriate types. Function definitions should also be consistent with declaration if there is any. Besides that, a function call cannot be made with non-function type identifier.After the syntax analyser and the semantic analyser confirms that the source program is correct, the compiler generates the intermediate code. Ideally a three address code is generated in real life compilers. But we have used 8086 Assembly Language
as our intermediate code so that we can run it in emu 8086
and justify that our compilation is correct.
We have generated the intermediate code on the fly
. Which means that, instead of using any data structure and passing the whole code one after another to the production rules of the grammar, we have generated the intermediate code as soon as we match a rule and write it in the code.asm file. To do that, we have to use the PUSH
and POP
instructions in the assembly code which utilize the stack.
Some Peephole
optimizations
- Remove redundant push and pop instructions.
-
If the first instruction is push and the second is pop and those contains the same address or register then we can remove both the instructions. For exammple,
*code.asm *optimized_code.asm PUSH AX -> ;PUSH AX POP AX ;POP AX
-
If the first is push and the second is a pop containing a register or an address then we can replace the two instructions with one MOV instruction. For example,
*code.asm *optimized_code.asm PUSH [BP + -2] -> MOV AX, [BP + -2] POP AX
-
Remove redundant move instructions
- If a move instruction has the same source and destination then we can remove the instruction. For example,
*code.asm *optimized_code.asm MOV AX, AX -> ;MOV AX, AX
- If consecutive two instructions are move and the first instruction contains the same register or address as the second instruction then we can remove the first instruction. For example,
*code.asm *optimized_code.asm MOV AX, BX -> ;MOV AX, BX MOV AX, CX MOV AX, CX
- If consecutive two instructions are move and the source of the first instruction is the destination of the second instruction and the source of the second one is the destination of the first then we can remove the second instruction. For example,
*code.asm *optimized_code.asm MOV AX, BX -> MOV AX, BX MOV BX, AX ;MOV BX, AX
- If a move instruction has the same source and destination then we can remove the instruction. For example,
-