/go-compiler

Compiler written in Go for the Monkey programming language

Primary LanguageGoMIT LicenseMIT

go-compiler

Compiler written in Go for the Monkey programming language

The Monkey programming language:

Monkey language has the following language features:

  • Integers
  • Booleans
  • Strings
  • Arrays
  • Hashes
  • Prefix-, infix- and index operators
  • Conditionals
  • Global and local bindings
  • First-class functions
  • Return statements
  • Closures

Let’s see what Monkey looks like:

Here is how we bind values to names in Monkey:

let age = 1;
let name = "Monkey"; 
let result = 10 * (20 / 2);

Besides integers, booleans and strings, the Monkey interpreter we’re going to build will also support arrays and hashes. Here’s what binding an array of integers to a name looks like:

let myArray = [1, 2, 3, 4, 5];

And here is a hash, where values are associated with keys:

let thorsten = {"name": "Thorsten", "age": 28};

Accessing the elements in arrays and hashes is done with index expressions:

myArray[0] _// => 1_
thorsten["name"] _// => "Thorsten"_

The let statements can also be used to bind functions to names. Here’s a small function that adds two numbers:

let add = fn(a, b) { return a + b; };

But Monkey not only supports return statements. Implicit return values are also possible, which means we can leave out the return if we want to:

let add = fn(a, b) { a + b; };

And calling a function is as easy as you’d expect:

add(1, 2);

A more complex function, such as a fibonacci function that returns the Nth Fibonacci number, might look like this:

let fibonacci = fn(x) { 
    if (x == 0) { 
        0
    } else { 
        if (x == 1) { 
            1
        } else { 
            fibonacci(x - 1) + fibonacci(x - 2);
        } 
    } 
}; 

Note the recursive calls to fibonacci itself!

Monkey also supports a special type of functions, called higher order functions. These are functions that take other functions as arguments. Here is an example:

let twice = fn(f, x) { 
    return f(f(x));
};

let addTwo = fn(x) { 
    return x + 2;
}; 

twice(addTwo, 2); // => 6

Why Go?

  • Go is really easy to read and subsequently understand.
  • With Go’s universal formatting style thanks to gofmt and a testing framework built-in, we can concentrate on our interpreter and not worry about 3rd party libraries, tools and dependencies.

How the compiler works

A compiler is computer software that transforms computer code written in one programming language (the source language) into another computer language (the target language). Compilers are a type of translator that support digital devices, primarily computers. The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language, object code, or machine code) to create an executable program. -- Wikipedia

Compiling cycle

The variety between compilers is so high that we can’t make a lot of universal statements about their architecture. That being said, we can ignore the details for a minute now and sketch out the architecture of something like the archetype of a compiler[Thorsten Ball]: Compiling Cycle

Generation Order Now our project is going to take a different path from the Interpreter, now see don't have our REPL to see the results in a fast cycle. This is the actual Generation order for our compiler. Generation Order

Compilation Time vs RunTime

We need to make the difference between Compile Time and Runtime values, as far as this is going to impact our compiler. Compilation vs Runtime

Stack Machine (Virtual Machine).

  • Why a Virtual Machine?

    When we implement a programming language we want it to be universal. It should be able to execute all possible programs and not just – as one example of many – functions we built into it. Universal computation is what we’re after and computers offer a solid model for it.

    If we construct a programming language based on that model, it’s going to have the same computational capabilities as the computer. It’s also one of the fastest ways to execute programs.

    But if executing programs like a computer is the best and fastest way, why not, you know, just let the computer execute the programs? Portability. We could write a compiler for our programming language that allows us to execute the translated programs natively on a computer. These programs would be really fast. But we would also have to write a new compiler for every computer architecture we want to run our programs on. That’s a lot of work. Instead, we can translate our programs into instructions for a virtual machine. And the virtual machine itself runs on as many architectures as its implementation language. In the case of the Go programming language that’s pretty portable. -- Thorsten Ball

Bytecode

Virtual machines execute bytecode. Like the machine code that computers execute, bytecode is made up of instructions that tell the machine what to do. Push this, pop that, add these, call this function. It’s called bytecode because the opcodes contained in each instruction are one byte in size.

An “opcode” is the “operator” part of an instruction, sometimes also called “op”. The PUSH we’ve seen earlier is such an opcode, except that in our toy example it was a multi-byte string and not just one byte. In a proper implementation PUSH would just be the name that refers to an opcode, which itself is one byte wide. These names, like PUSH or POP, are called mnemonics. They help us programmers remember and talk about opcodes.

The operands (also called arguments or parameters) to these opcodes are also contained in the bytecode. They’re placed alongside each other, with the operands following the opcodes. The operands, though, aren’t necessarily one byte wide. For example, if an operand is an integer and greater than 255, it would take multiple bytes to represent it. Some opcodes have multiple operands, some just one and some don’t have any at all. Whether the bytecode is designed for a register, or a stack machine has a huge influence here.

You can imagine bytecode as a sequence of opcodes and operands, laid out in memory next to each other: -- Thorsten Ball Bytecode

Our Compiler is faster than our Interpreter!

We build it into its own executable:

$ go build -o fibonacci ./benchmark

And now, welcome, first on the stage, the evaluator:

$ ./fibonacci -engine=eval

engine=eval, result=9227465, duration=27.204277379s

27 seconds. Time for our final pat on the back:

$ ./fibonacci -engine=vm

engine=vm, result=9227465, duration=8.876222455s

8 seconds. 3.3 times faster.

References:

Writing a Compiler in Go (Thorsten Ball)

LICENSED UNDER MIT LICENSE