A compiler, named jmm, which translates programs in Java--* into java bytecodes. It can be executed using the jasmin library and also has options for code optimization, which can be applied by making use of the -o
and -r=x
options.
*Based on the MiniJava proposed in Appel and Palsberg's“Modern Compiler Implementation in Java”, 2ndEdi-tion, pages 484-486
The syntatic error recovering was applied for while statments. The following code shows the recovery is done:
<WHILE> <OPENCURVEBRACKET>
try{
AndExpression() <CLOSECURVEBRACKET>
} catch(ParseException e){
hasError = true;
int lineError = e.getStackTrace()[0].getLineNumber();
String message = e.getMessage();
reportList.add(new Report(ReportType.ERROR, Stage.SYNTATIC, lineError, message));
Token t;
int counter = 0;
while(true) {
t = getToken(1);
if (t.kind != OPENBRACKET) getNextToken();
else break; // found a {
// Consecutives ). Init new while.
if (t.kind == CLOSECURVEBRACKET) break;
}
// Consume all consecutives ) if exists.
while(t.kind == CLOSECURVEBRACKET){
t = getToken(1);
if (t.kind == CLOSECURVEBRACKET) t = getNextToken();
}
}
Statement() #WhileBody
Putting it into words, once there's an error in the while loop statment the catch block will be executed. Inside it, a report will be added to the reports list indicating the it has occurred an error. Besides, the parser will start re-executing the from the first token that is not part of the while statment, so as to try to find more errors. This process is done to maximize the number of errors displayed for the programmer in the terminal.
The goals of the Semantic Analysis are to verify if the program is according to the definitions of the programming language, by reporting the semantic errors with useful messages to the user.
Our compiler implments the following semantic rules:
- Type Verification:
- The boolean operations && and ! must be between boolean operands;
- The boolean operation < must be between integers;
- Arithmetic operations +, -, *, / must be between integers;
- The type of the assignee must be equal to the type of the assigned (a_int = b_boolean not allowed);
- Conditional expressions (if e while) result in a boolean value.
- Array Access Verification:
- It is not possible to use array access directly to arithmetic operations;
- Ensure that an array access is done to an array (e.g. true[1] is not allowed);
- Verify if the index of an array access is an integer (e.g. a[true])
- Method Verification
- Verifing if the object of a certain class contains the method invoked;
- Case the method invoked is from the current class (call the method using this
this
) and there's no extends, then the code returns an error. Otherwise it's assumed that it's from the super class. - Case the method is not from a declared class (a importeded class), it's assumed as existent and the exptected types are assumed. (i.e. a = Foo.b(), if a is an integer and Foo is a imported class, it's assumed that the method b is static, since we are accessing a method directly from teh class that doesn't have arguments and return an integer);
- Verifying if the number of arguments is equals the number of parameters in the declaration.
- Verifying if the type of parameters are equals the type of the arguments.
- Array Initialization:
- The size of array should be and integer when creating a new one (e.g. a = new int[0];)
- Class declaration:
- The class extended needs to be imported.
- Length Builtin "length" only exists over arrays.
Code Generation: (describe how the code generation of your tool works and identify the possible problems your tool has regarding code generation.)
The AST is recursively converted to OLLIR (Optimized Low-Level Intermediate Representation). We created the class OllirEmitter, in which there is a function for converting each type of expression and statement. Auxiliar variables are created when necessary.
The optimizations we did to the code were the following:
Before generating the Ollir code, a ConstantPropagationVisitor, which extends the AJmmVisitor, was used to replace the values of constants in expressions. This visitor is applied until there are no changes in the AST.
Example of an optimized While Loop:
if(!condition) goto endLoop;
Loop:
codeBlock
if(condition) goto Loop;
endLoop:
codeBlock
All the jasmin basic structure is being generate including the constructor, fields and methods.
More than that, assignments and arithmetic operations are converted from the OLLIR code.
Also the limit_stack and limit_locals are being calculated dinamically. The approachs for defining each one of them is well defined:
-
The limit_locals is calculated according to this formula: 1 + number of parameters + number of local variables. The number 1 stands for the register 0 which is allocated by the instance of the class:
this
. -
To calculating the limit_stack we have developed an simple algorithm to calculate this. The pseudocode for it can be checked bellow:
int currentSize = 0; int maxUntilNow = 0; for (var instruction: instructions){ int numberOfPops = getNumberOfPops(instruction); int numberOfPushes = getNumberOfPush(instruction); int stackVariation = numberOfPushes - numberOfPops; maxUntilNow = max(currentSize, maxUntilNow); currentSize += stackVariation; }
In other words, each instruction contributes for the variation of the stack size. This variation is computed in the currentSize
variable, while the jasmin is being produced. However, in the end the max size for the variable is saved in the maxUntilNow
variable.
By using this algorithm, it's possible to calculate the limit stack in an efficient way.
The optimization -r is a process that contains three types of algorithms:
- The dataflow analysis
- Calculating the liverange and the interferences between variables
- Apply the coloring graph
About the -r optimization it's important to discuss some aspects that might defeer from other works of this subject.
The concept of live ranging and how to calculate is one of the aspects that we think that might be controversial. In our project we consider that the live range is the all the instructions between the first definition
of the variable and the last in
of it.
The in
was calculated in the process of dataflow.
The coloring heuristics follows the algorithm apresented in the theorical classes: Kempe’s graph-coloring algorithm. The algorithm can be checked in this link.
- Checkpoint 1:
The first checkpoint was made in group by pair programming and sharing between the members using tools such as
code with me
from IntelliJ. - Checkpoint 2: In the second checkpoint the group divided the tasks equally. Just the symbol table creation wasn't divided, but made by the whole group together.
- Checkpoint 3:
In this checkpoint the tasks were divided in the following way:
- OLLIR: Diana Freitas and Diogo Samuel
- JASMIN: Hugo Guimarães and Juliane Marubayashi
- Checkpoint 4:
In this checkpoint the tasks were divided in the following way:
- -o optimization: Diana Freitas and Diogo Samuel
- -r optimization: Juliane Marubayashi
We did everything that was suppose to do, trying to always use the best design patterns to make the code modular and easy to understand. We also did the optimizations -o and -r.
We don't sanitize the variables. For example, 'array' and '$' are reserved in OLLIR and if the code has these variables it will break and OLLIR will not run.
The -r optimization could be better when showing error messages. Once it's not possible to make the graph coloring for certain number of registers the code generation is interrupted by an OPTIMIZER
exception. However in some cases that the number of register is too low, another error might appear, which does not describe correctly the nature of the exception.