This repository contains the prototype compiler for my master's thesis.
It is a proof-of-concept implementation for a modified version of the C-- language showcasing type inference in the context of systems programming and inference-guided automatic resource management.
- cabal (version 3.2+; preferably version 3.6.2 acquired through ghcup)
- ghc (version 8.8.4 - 9.0.2; preferably version 8.10.7 acquired through ghcup)
- llc (version 12+; preferably version 12)
Two modified cabal packages included as git submodules in the vendor
directory:
- llvm-hs-pure from https://github.com/jiriklepl/llvm-hs, modified version of https://github.com/llvm-hs/llvm-hs
- llvm-hs-pretty from https://github.com/jiriklepl/llvm-hs-pretty, modified version of https://github.com/llvm-hs/llvm-hs-pretty
To get the repository:
git clone https://github.com/jiriklepl/masters-thesis-code.git
cd masters-thesis-code
To build the project:
git submodule init
git submodule update
cabal build --only-dependencies Compiler
cabal build Compiler
# development documentation:
cabal haddock Compiler
For general use:
cabal run Compiler -- [options...] input_file
For /dev/stdout
output, use:
cabal run Compiler -- -o - input_file
The usage of the compiler is explained by the standardized --help
:
cabal run Compiler -- -h
For running the testing examples in the folder examples/
:
./run_examples.sh
Some of the examples test the compiler for invalid input, which will cause several error messages to be printed out during the process.
The source code is documented with Haddock. You can build the HTML documentation by running:
cabal haddock Compiler
This command is quite verbose but its output ends with the location of the index file of the generated documentation (it should be somewhere in the dist-newstyle
folder, most likely in dist-newstyle/build/<arch>/<ghc-version>/Compiler-0.1.0.0/x/Compiler/doc/html/Compiler/Compiler/index.html
).
When reading the source files of the program, we suggest using Haskell Language Server (HLS), which parses the documentation comments and makes the documentation more easily accessible.
The project contains many modules of varying significance documented with Haddock, here we list the main ones:
CMM.Pipeline
: contains the high-level logic of the compiler and wrappers for the main phases of the compiler pipelineCMM.Lexer
: contains the definition of the tokenization phaseCMM.Parser
: contains the definition of the parsing phaseCMM.AST
: contains the definitions of various abstract syntactic tree (AST) nodes used as an representation of the programCMM.AST.Flattener
: defines the functionflatten
, which flattens the given ASTCMM.AST.Blockifier
: defines the functionblockify
, which blockifies the procedures in the given AST, annotating each statement with a block annotation (that assigns the given statement to a corresponding basic block). It also produces theBlockifierState
, which is refined byCMM.FlowAnalysis
CMM.FlowAnalysis
: defines the flow analysis for a given procedure. It is issued byCMM.Blockifier
and refines itsBlockifierState
CMM.Inference.Preprocess
: contains the definition of the functionpreprocess
, which performs the inference preprocessing phase: elaborates the AST and generates the constraints that represent the type semantics of the programCMM.Inference
: defines the functionreduce
, which performs the inference pipeline on the given set of constraints (facts), producing anInferencerState
that can interpret types of the elaborated AST- The various data are defined in:
CMM.Inference.Type
,CMM.Inference.TypeCompl
,CMM.Inference.TypeKind
,CMM.Inference.Properties
,CMM.Inference.Fact
,CMM.Inference.DataKind
,CMM.Inference.Constness
- The various data are defined in:
CMM.Monomorphize
: contains the definition of the functionmonomorphize
, which monomorphizes the given program represented by elaborated AST. It uses theInferencerState
to interpret each typeCMM.FillElabs
,CMM.Mangle
: these two modules define the postprocessing of monomorphized code - filling-in concrete types in place of type variables according and name-mangling of monomorphic copies of polymorphic top-level definitionsCMM.Translator
: contains the definition of the functiontranslate
, which performs the translation phase on an elaborated and blockified AST - emission of the LLVM assembly. It usesBlockifierState
andInferencerState
to interpret the control flow and types, respectively
Modules LLVM.*
in directory vendor/*
are from modified versions of packages llvm-hs-pure and llvm-hs-pretty
The source files are formatted by hindent
and checked by hlint
(should not produce any hints). The source should compile without warnings (tested with ghc-8.10.7). The script .\run_examples.sh
should compile all example files and successfully interpret them by llc
.