/minij

A toy implementation of a compiler written in python

Primary LanguagePythonMIT LicenseMIT

Minij

Table of contents

About

A toy implementation of a compiler written in python. The compiler supports a language with a syntax similar to Java.

The program is a very complete piece of software, as it can not only handle errors generated by the program itself, but it can also handle external issues such as file management and allocation. In addition, the program was subjected to a large number of tests.

How It Works

The compiler is composed of several phases which are linked to each other.

Lexical Analyzer (Lexer)

This is the first phase of the compiler, here all the input text is read so that it is later separated by words using the blank spaces and special characters of the grammar.

Syntax Analyzer (Parser)

This is the second phase of the compiler. In this case we are using a SLR or LR(0) parser, because it had the least amount of conflicts and states produced for our grammar. The parser reads all the tokens found by the lexer and analyzes them conforming to the rules of our grammar.

If you need more information check the Analysis or Parsing Table files.

Semantic Analyzer

The semantic analyzer checks for errors in that the lexer and parser couldn't catch. For that we have some atributes for the symbols that we found:

  • Class declaration is added to the symbol table only if the classes and interfaces that extends and implements are declared and there isn't any other class with the same name. It has attributes like scope, category and type.
  • Interface declaration is added to the symbol table only if there isn't any other interface with the same name. It has atributes like scope, category and type.
  • Function declaration is added to the symbol table only if there isn't any other function with the same name. It has atributes like scope, category, parameters and type.
  • Variable declaration is added to the symbol table only if there isn't any other variable with the same name. It has atributes like scope, category, value and type.
  • Object declaration is added to the symbol table only if there isn't any other variable with the same name and the type is a valid class. It has atributes like scope, category, value and type.
  • Function call is added to the symbol table only if the funtion is declared before the call and the parameters of the call are the same as in the declaration. It has atributes like scope, category, parameters and type.
  • Object Accessing is added to the symbol table only if the object is from a valid class and in that class the property exists. It has atributes like scope, category, value and type.

Grammar Explanation

The syntax of the language is very similar to the syntax of Java, which is why the tokens are practically the same. There are some considerations that you should keep in mind, such as:

  • The language is case sensitive, such that if is a reserved word but IF is an identifier.
  • All the comments must have an end.
  • All the strings must have an end.
  • No multiline strings are allowed.

Reserved words

These are a collection of keywords that are predefined identifiers that have special meaning to the compiler. They cannot be used in your program.

  • void
  • int
  • double
  • boolean
  • string
  • class
  • const
  • interface
  • null
  • this
  • extends
  • implements
  • for
  • while
  • if
  • else
  • return
  • break
  • New
  • System
  • out
  • println

Identifiers

An identifier is a sequence of letters, digits and dollar sign. It can start with anything except a number.

// bad
float 1number = 3.4;

// good
int number1 = 1;
int anotherNumber1 = 1;

Comments

The language supports single and multi line comments. This could be useful to explain parts of the code and make it more readable.

  • For single line comments use //.
  • For multi line comments use /* to start and */ to end.

Constants

  • The boolean constants are true or false.

  • A integer constant can be expressed in base 10 or base 16.

    • In base 10 it must be only a sequence of digits from 0 to 9.
    • In base 16 it must start with 0x or 0X and be followed by any hexadecimal value.
// bad
int bad = 98.1;
int another = .12;

// good
int base10 = 8;
int anotherBase10 = 012;
int base16 = 0X0;
int anotherBase16 = 0x12aE;
  • A double constant must be a sequence of digits followed by a point and:
    • A sequence of numbers or nothing.
    • An exponential notation with the sign of the exponent.
// bad
double bad = 32.1.2;
double another = .12;
double badExponential = .2e-3;

// good
double number = 8.;
double anotherNumber = 3.82;
double exponential = 12.E3;
int anotherExponential = 3.2e-2;
  • The string constant is a sequence of characters surround by "", this cannot contain:
    • A double quote.
    • A new line.
    • Null character.

Getting Started

These instructions will get you a copy of the project up and running on your machine. If you don't want to compile it, you can download the executable for your OS.

1. Install dependencies

2. Get the code

git clone https://github.com/betoSolares/minij.git

3.Compile

cd minij
pip install -r requirements/build.txt
make build

These should create a directory called build with an executable file inside called minij.

Usage

The program is designed to be run through the command line, if you try to run it through a GUI you may encounter some errors.

The compiler can receive any plain text file regardless of its extension and can only work with one file at a time. After analyzing the file, all the errors and warnings found are printed on the screen.

The basic instruction to use the program would be the following:

./minij <input file>

If you need more information about the options you can pass use minij -h or minij --help

NOTE keep in mind that minij is the name of the executable file and keep in mind that we are assuming that the executable file is in the same directory that we are in, if this isn't your case please use the absolute path to the executable file.

Examples

/*
  This is a simple hello world in java
  that can be parsed by minij
*/

class Main {
    void main(string[] args) {
        System.out.println("This will be printed");
    }
}

More examples can be found here.

Attribution

You are free to copy, modify and distribute the project with attribution under the terms of the MIT License. See the LICENSE file for more information.