/handmade-compiler

Implement the lexical analyzer for a simplified Java programming language. Finally, implement the compiler.

Primary LanguagePython

handmade-compiler

Implement the lexical analyzer and syntax analyzer for a simplified Java programming language. Finally, implement the compiler.

Usage

lex_and_parse.py test2.java

Simplified java

< Lexical specifications >

  • Variable type
    • int for a signed integer
    • char for a single character
    • boolean for a Boolean string
    • String for a literal string
  • Signed integer
    • A single zero digit (e.g., 0)
    • A non-empty sequence of digits, starting from a non-zero digit
    • (e.g., 1, 22, 123, 56, … any non-zero positive integers)
    • (e.g., 001 is not allowed)
    • A non-empty sequence of digits, starting from a minus sign symbol and a non-zero digit
    • (e.g., -1, -22, -123, -56, .. any non-zero negative integers)
  • Single character
    • A single digit, English letter, block, or any symbol, starting from and terminating with a symbol ‘ (e.g., ‘a’, ‘1’, ‘ ‘, ‘&’)
  • Boolean string
    • true and false
  • Literal string
    • Any combination of digits, English letters, and blanks, starting from and terminating with a symbol “ (e.g., “Hello world”, “My student id is 12345678”)
  • An identifier of variables and functions
    • A non-empty sequence of English letters, digits, and underscore symbols, starting from an English letter or a underscore symbol
    • (e.g., i, j, k, abc, ab_123, func1, func_, func_bar)
  • Keywords for special statements
    • if for if statement
    • else for else statement
    • while for while statement
    • class for class statement
    • return for return statement
  • Arithmetic operators
    • +, -, *, and /
  • Assignment operator
    • =
  • Comparison operators
    • <, >, ==, !=, <=, and >=
  • A terminating symbol of statements
    • ;
  • A pair of symbols for defining area/scope of variables and functions
    • { and }
  • A pair of symbols for indicating a function/statement
    • ( and )
  • A pair of symbols for using an array
    • [ and ]
  • A symbol for separating input arguments in functions
    • ,
  • Whitespaces
    • a non-empty sequence of \t, \n, and blank

< CFG G >

  1. S -> CODE
  2. CODE -> CDECL CODE
  3. CODE -> FDECL CODE
  4. CODE -> VDECL CODE
  5. CODE -> ''
  6. VDECL -> vtype id semi
  7. VDECL -> vtype ASSIGN semi
  8. ASSIGN -> id assign RHS
  9. RHS -> EXPR
  10. RHS -> literal
  11. RHS -> character
  12. RHS -> boolstr
  13. EXPR -> T addsub EXPR
  14. EXPR -> T
  15. T -> F multdiv T
  16. T -> F
  17. F -> lparen EXPR rparen
  18. F -> id
  19. F -> num
  20. FDECL -> vtype id lparen ARG rparen lbrace BLOCK RETURN rbrace
  21. ARG -> vtype id MOREARGS
  22. ARG -> ''
  23. MOREARGS -> comma vtype id MOREARGS
  24. MOREARGS -> ''
  25. BLOCK -> STMT BLOCK
  26. BLOCK -> ''
  27. STMT -> VDECL
  28. STMT -> ASSIGN semi
  29. STMT -> if lparen COND rparen lbrace BLOCK rbrace ELSE
  30. STMT -> while lparen COND rparen lbrace BLOCK rbrace
  31. COND -> COND comp C
  32. C -> COND
  33. C -> boolstr
  34. ELSE -> else lbrace BLOCK rbrace
  35. ELSE -> ''
  36. RETURN -> return RHS semi
  37. CDECL -> class id lbrace ODECL rbrace
  38. ODECL -> VDECL ODECL
  39. ODECL -> FDECL ODECL
  40. ODECL -> ''