A simple SysY compiler written in Java
Coming soon.
- ANTLR parser
- Type check
- Convert to my AST
- Resolve symbols
- Linear IR design
- IR Interpreter
- CFG construction
- Dominator tree construction
- SSA transformation
- Local optimization/analysis
- Local Value Numbering(LVN)
- Local Dead Code Elimination(DCE)
- Loop Identification(LI)
- Loop invariant
- Code motion
- Global optimization/analysis
- Dataflow analysis framework
- Constant propagation
- Reaching definition
- Live variables
- Dataflow analysis framework
- Inter procedural optimization (optional)
- Code generation
- X86-64
- risc-v 32
- ARM V7 (Contest Required)
- Peephole optimization
- Pattern matching
Grammar and Semantics defined here may differ slightly to the original version from BUAA.
All arrays are of constant length.
When used as function parameters, the first dimension of the array can be omitted.
int x = 20005;
int global_arr_err[x][114]; // not allowed!
const int N = 20005;
int global_arr[N][114]; // correct
int foo(int arr3[][114][514], int arr2[][514], int arr0) {
}
int main() {
int arr[255][114][514];
foo(arr, arr[1], arr[1][2][3]);
}
Arrays can be initialized using bracket expressions. Positions not specified in global arrays are initialized to 0s.
int arr[114][514] = {
{ 1, 1, 4, 5, 1, 4},
{ 1, 9, 1, 9, 8, 1, 0},
6, 6, 6
};
const expressions can be evaluated at compile time.
const int N = 114;
const int constA[N * 2 + 1] = { N, 514 };
const int constB[constA[0]][2] = { { 1919, 810 } };
int main() {
const int local[4] = { 1, 2, 3, 4 };
return constB[1][0] - local[N % 3]; // 1918
}
Functions return Void
Int
Flt
only.
Global Variables and Arrays are automatically allocated in global space.
Local Arrays are allocated when the function it belongs to executes.
Local Variables can be treated as virtual registers.
A program in the form of IR consists of several functions and allocations.
i.e. A program is a List<Fun | Alloc>
, where each Fun
contains a List<Instr>
that makes up the function body.
- Val. Literals including
Int
Flt
Ptr
Void
- Var. All local variables can be used directly without declaration.
A definition is a raw assignment in the form of var = rhs
, where var
must be Var
and rhs
can be either Val
or Var
.
Binary expressions are in the form of var = OP lhs rhs
.
ADD
,FADD
,PADD
. Addition OPs, prefix stands forInt
Flt
andPtr
ADD.SUB
,FSUB
.MUL
,FMUL
.DIV
,FDIV
.MOD
,LT
,LE
,GT
,GE
,NE
,EQ
.AND
,OR
var = OP sub
NEG
,POS
NOT
TOF
,TOI
. Conversion betweenFlt
andInt
.
var = CALL funSig [argList]
Function Signature is a 4-tuple (name, retType, [argTypes], [params])
that identifies this function.
var = LOAD loc
, where loc
is a Var
of type Loc
.
loc = STORE rhs
, where loc
is a Var
of type Loc
.
Jump. JMP label
BR cond lTru lFls
reads cond
and determine
- if the value of
cond
is zero, then gotolFls
. - otherwise, goto
lTru
.
Labels mark positions that JMP
and BR
can transfer their control flow to.
RET var
.
A RET var
ends current function, take var
as the return value and assign it to lhs
of the call site.
TODO
Trivial type for trivial values.
An Int
typed variable holds a 32-bit 2's-complement encoded signed integer.
A Flt
typed variable holds a 32-bit IEEE-754 standard floating point number.
Loc
is a type constructor that takes a BaseType T
and returns a new Type Loc<T>
.
A Loc<T>
typed variable points to a location that holds values of type T
.