/compiler

The adventures of a Haskell compiler

Primary LanguageHaskellGNU General Public License v3.0GPL-3.0

Compiler Quest

The main goal is to continually level-up a self-hosting Haskell compiler. However, it’s a survival game as well as an RPG: it should always be possible to build this compiler starting with only a C compiler.

I had thought this was a daunting challenge. In the past, constructing a parser alone was a laborious grind for me. Then it got worse: compilers manipulate abstract syntax trees, so the self-hosting requirement demands the source language be complex enough to handle compound data types.

This time around, I found it shockingly easy to bootstrap a compiler for stripped-down Haskell (or perhaps I should say souped-up lambda calculus), thanks to a few tricks:

  • Parsing combinators are a joy to build from scratch and a joy to use.

  • Kiselyov’s bracket abstraction algorithm is simple yet practical.

  • Interpreting basic combinators is child’s play, and almost the only task we need perform in C or assembly.

  • Lambda calculus gives us the Scott encoding for free. Thus we effortlessly gain algebraic data types.

  • Laziness is at odds with native instructions, which are eagerly evaluated. However, we can readily reconcile their differences with shrewdly chosen combinators.

Perhaps the greatest difficulty was mustering the discipline to mark the road taken so that anyone with a C compiler can follow.

How to build

The earliest generations of our compiler reside in vm.c:

$ cc -O2 vm.c -o vm

When run without arguments, this program gets each compiler to compile its successor, which results in a series of numbers that we output to raw:

$ ./vm > raw

These numbers are a compiled form of the barely.hs compiler. The vm run command reads this raw file and interprets it to compile a given Haskell file:

$ echo "prependH s = 'H':s;" > /tmp/example.hs
$ echo "ello, World!" | ./vm run /tmp/example.hs

It only accepts code that type-checks. Moreover, the (⇐) operator is undefined and disallows mixing the (++) operator with (:) unless its fixity has been declared. This conflicts with the examples bundled with my IOCCC entry. The vm ioccc command inserts code to fix these issues:

$ ./vm ioccc fib.hs

The compilers thus far expect pure functions as input. The last function should have type String → String, and we implicitly wrap it in the interact function from the standard Haskell Prelude during compilation.

The effectively.hs compiler bucks the trend. It assumes the last function has type IO (), and treats it like main in a standard Haskell program. It also has support for FFI.

The lonely.hs compiler is the first generation with a main function of type IO (). It also outputs C code that should be appended to rts.c.

In sum, after running the above to produce raw, we can build a standalone compiler with:

$ (cat rts.c; ./vm run effectively.hs < lonely.hs) > lonely.c
$ cc -O2 lonely.c -o lonely

(A few generations later, our virtually.hs compiler bundles the runtime system with the output, so the cat is no longer needed.)