handmade-compiler
Implement the lexical analyzer and syntax analyzer for a simplified Java programming language. Finally, implement the compiler.
Usage
lex_and_parse.py test2.java
Simplified java
< Lexical specifications >
- Variable type
- int for a signed integer
- char for a single character
- boolean for a Boolean string
- String for a literal string
- Signed integer
- A single zero digit (e.g., 0)
- A non-empty sequence of digits, starting from a non-zero digit
- (e.g., 1, 22, 123, 56, … any non-zero positive integers)
- (e.g., 001 is not allowed)
- A non-empty sequence of digits, starting from a minus sign symbol and a non-zero digit
- (e.g., -1, -22, -123, -56, .. any non-zero negative integers)
- Single character
- A single digit, English letter, block, or any symbol, starting from and terminating with a symbol ‘ (e.g., ‘a’, ‘1’, ‘ ‘, ‘&’)
- Boolean string
- true and false
- Literal string
- Any combination of digits, English letters, and blanks, starting from and terminating with a symbol “ (e.g., “Hello world”, “My student id is 12345678”)
- An identifier of variables and functions
- A non-empty sequence of English letters, digits, and underscore symbols, starting from an English letter or a underscore symbol
- (e.g., i, j, k, abc, ab_123, func1, func_, func_bar)
- Keywords for special statements
- if for if statement
- else for else statement
- while for while statement
- class for class statement
- return for return statement
- Arithmetic operators
- +, -, *, and /
- Assignment operator
- =
- Comparison operators
- <, >, ==, !=, <=, and >=
- A terminating symbol of statements
- ;
- A pair of symbols for defining area/scope of variables and functions
- { and }
- A pair of symbols for indicating a function/statement
- ( and )
- A pair of symbols for using an array
- [ and ]
- A symbol for separating input arguments in functions
- ,
- Whitespaces
- a non-empty sequence of \t, \n, and blank
< CFG G >
- S -> CODE
- CODE -> CDECL CODE
- CODE -> FDECL CODE
- CODE -> VDECL CODE
- CODE -> ''
- VDECL -> vtype id semi
- VDECL -> vtype ASSIGN semi
- ASSIGN -> id assign RHS
- RHS -> EXPR
- RHS -> literal
- RHS -> character
- RHS -> boolstr
- EXPR -> T addsub EXPR
- EXPR -> T
- T -> F multdiv T
- T -> F
- F -> lparen EXPR rparen
- F -> id
- F -> num
- FDECL -> vtype id lparen ARG rparen lbrace BLOCK RETURN rbrace
- ARG -> vtype id MOREARGS
- ARG -> ''
- MOREARGS -> comma vtype id MOREARGS
- MOREARGS -> ''
- BLOCK -> STMT BLOCK
- BLOCK -> ''
- STMT -> VDECL
- STMT -> ASSIGN semi
- STMT -> if lparen COND rparen lbrace BLOCK rbrace ELSE
- STMT -> while lparen COND rparen lbrace BLOCK rbrace
- COND -> COND comp C
- C -> COND
- C -> boolstr
- ELSE -> else lbrace BLOCK rbrace
- ELSE -> ''
- RETURN -> return RHS semi
- CDECL -> class id lbrace ODECL rbrace
- ODECL -> VDECL ODECL
- ODECL -> FDECL ODECL
- ODECL -> ''