This is my attempt at writing an understandable implementation of a subset of Haskell, in Haskell itself.
This project can be built with cabal build
, or installed
with cabal install
. Installation should put an executable named
haskell-in-haskell
on your path. Otherwise, you can run the project
directly with cabal run haskell-in-haskell --
, followed by the arguments
you'd like to pass.
To compile a Haskell file, just run:
haskell-in-haskell compile in.hs out.c
in.hs
is the Haskell input file, and out.c
is the C code to generate.
The compiler only works with single Haskell files. To run the generated code,
you'll need to have a C compiler handy, and to make sure that
the runtime.c
file is in the same directory, before running:
gcc -std=c99 out.c
(Replacing gcc
by your C compiler of choice)
The C generated should conform to the C99 standard. You can also
pass -DDEBUG
to C compiler, which will enable some debug printing.
This will print out some Garbage Collection information, and a "stack trace"
of program execution:
hs__entry
indirection_entry
hs_main
indirection_entry
hs_foo
hs__S_add
hs_foo__S__S_2
hs_increment
hs_foo__S__S_2__S__S_3
hs_increment_case_0
GC Done. 0x00040 ↓ 0x00040 ↑ 0x000C0
hs__S_add
hs_foo_x
hs__S_add_case_0
hs_increment_case_0__S__S_0
hs__S_add_case_0_case_0
hs__S_add_case_0
hs_foo_x
hs__S_add_case_0_case_0
hs__entry_case_0
You can also see the compiler's output after various stages, so:
This will print the tokens after running the Lexer stage
haskell-in-haskell lex in.hs
This will print the parsed AST:
haskell-in-haskell parse in.hs
This will print the simplified AST:
haskell-in-haskell simplify in.hs
This will print the AST after type-checking:
haskell-in-haskell type in.hs
This will print the "STG" for your code, which is an intermediate functional IR:
haskell-in-haskell stg in.hs
Finally, this will print the "CMM", which is the last stage before generating C code:
haskell-in-haskell cmm in.hs
The first big missing thing is IO
. The way main
works in this version
of Haskell is that it must either be main :: Int
, or main :: String
,
and the program evaluates main
before printing out its value.
In terms of features we have:
Your standard top level definitions and bindings:
x :: Int
x = 33
y :: String
y = 4
z :: Int -> Int
z 3 = 0
z _ = 4
We have let expressions:
x =
let z = 3
y = 4
in z + y
As well as where expressions:
x = z + y
where
z = 3
y = 4
Your standard arithmetic operators:
x = 1 + 3 / 45 * 6 - 34
Strings, and concatenation:
x = "foo" ++ "bar"
Booleans:
true :: Bool
true = True
false :: Bool
false = False
z :: Bool
z = 3 > 4
If expressions:
x = if 1 > 2 then 3 else 5
Case expressions:
case z of
1 -> 3
x -> x
_ -> 4
case x of
"foo" -> "bar"
_ -> "baz"
This extends to your standard multiple function heads:
add 0 0 = 0
add n 0 = n
add n m = add n (m - 1)
We have polymorphic functions:
id :: a -> a
id x = x
As well as your standard polymorphic ADTs:
data List a = Cons a (List a) | Nil
length :: List a -> Int
length Nil = 0
length (Cons _ rest) = 1 + length rest
Finally, we have type synonyms, which can't be polymorphic:
type ListInt = List X
type X = Int
Everything beyond that has not been implemented. I'd say this is a
respectable chunk of Haskell 98, but some glaring omissions include
IO
, modules, and typeclasses, as well as the entire standard library.
In terms of the implementation, this is based of the 1992 STG
paper,
so we have lazy evaluation, garbage collection, and updates, to avoid
evaluating closures twice.
The /examples
directory is a bit poor, but the /integration_tests
directory should have a few more examples of things accepted by the compiler.
If you want to help fix a bug, a first recommendation is to use the following compilation options for your C output:
gcc -DDEBUG -g -fsanitize=address -fsanitize=undefined -std=c99 .output.c
This will enable asan
and ubsan
, which provide runtime checks
against common errors with C. This is immensely useful in catching
bad usage of memory as early as possible.
This is the source of most bugs in the compiler.
There are two forms of tests. Some stages have basic tests, which can be run with
cabal test
but these mainly cover the frontend of the compiler.
A more extensive suite of tests can be run with:
python3 integration_tests.py
This script will go over all the Haskell files in integration_tests/
,
and run the full compiler over them, and then run gcc
over the outputted
C, and then run the resulting executable, making sure that the output
matches the expected results.
For this to work, you need to have gcc
available, along with ubsan
and asan
.
These are used in order to detect memory issues early in the tests.
This suite is slower, but much more effective at finding problems.
I'm currently writing a series going through the implementation of this compiler in depth, so I'd recommend looking through that if you want to understand more.
https://www.microsoft.com/en-us/research/wp-content/uploads/1992/04/spineless-tagless-gmachine.pdf https://homepages.inf.ed.ac.uk/wadler/papers/pattern/pattern.pdf http://moscova.inria.fr/~maranget/papers/ml05e-maranget.pdf