Gibbon is an experimental compiler that transforms high-level functional programs to operate on serialized data.
Typically, programs that process tree-like data represent trees using pointer-based data structures in memory (one heap object per-leaf and per-node) because such a layout is convenient to manipulate in a high-level programming language. This is also generally distinct from the representation of the data in serialized form on disk, which means that a program must perform some sort or marshaling when working with serialized data. Gibbon unifies the in-memory and serialized formats, transforming recursive functions to operate directly on serialized data.
Additionally, while the pointer-based structure is efficient for random access and shape-changing modifications, it can be inefficient for traversals that process most or all of a tree in bulk. The Gibbon project aims to explore optimizations of recursive tree transforms by changing how trees are stored in memory.
Currently, the Gibbon compiler has multiple front-ends: an s-expression synax similar to Typed Racket, and a small subset of Haskell.
Gibbon is implemented in Haskell, and is set up to be built with Stack. After you install Stack, proceed to installing Gibbon's dependencies.
- Ubuntu:
$ sudo apt-get install libgc-dev libgmp-dev gcc-7 uthash-dev
$ sudo add-apt-repository ppa:plt/racket && sudo apt update && sudo apt install racket
- OSX:
You can install some of the dependencies using Homebrew:
$ brew install libgc gmp
Others require a few extra steps:
-
Racket: Follow the instructions on it's website
-
uthash: Clone the repository and copy all the
.h
files insrc
to/usr/local/include
After you have both Stack and all the dependencies installed, you can build Gibbon from source:
$ git clone https://github.com/iu-parfunc/gibbon
$ cd gibbon && source set_env.sh
$ cd gibbon-compiler && stack setup && stack build
At this point you can run the Gibbon executable:
$ stack exec -- gibbon -h
If you'd like to run the testsuite, you can do so with:
$ cd $GIBBONDIR && ./run_all_tests.sh
Gibbon can be run in multiple modes. The -r
flag indicates that you want to
compile and run a program:
$ gibbon -r ./demo/Add1.hs
If the extension is *.hs
, it is assumed to be a Haskell source file, and if it
is *.gib
it is a Racket source file starting with #lang gibbon
.
Just the -r
flag by itself will apply the high-level specialization optimizations
but will not use packed data. The -p
flag instructs Gibbon to use
a packed representation whenever possible. These can be combined:
$ gibbon -rp ./demo/PolyTree.hs
This primarily stores the Gibbon compiler, an implementation of a high-performance functional language.
This repository also contains a collection of sub-projects related to benchmarking tree traversals and performing tree traversals on packed representations. Here is a guide to the subdirectories:
-
gibbon-compiler - the prototype compiler for the Gibbon language of packed tree traversals.
-
gibbon - a Racket #lang for Gibbon.
-
ASTBenchmarks - benchmark of treewalks (compiler passes) on ASTs written with Gibbon. Also includes scripts to fetch input datasets.
-
BintreeBench
- a submodule containing the tiniest binary tree microbenchmark, implemented several different languages and compilers. -
core-harvest - tools to harvest realistic, large ASTs (mainly Racket) from the wild.
-
DEVLOG.md - detailed documentation for those hacking on this repository.