/lc

simple lisp interpreter in C++

Primary LanguageC++MIT LicenseMIT

Simple lisp compiler/vm (with optional jit) in C++

Cell types: Nil, Pair, Int, String, Lambda + 2 internal types: InstructionPointer, Environment

64 bit Cell format:

  • Any cell type: .... .... ........ ........ ........ ........ ........ ........ ........
    • 4 bits type | 60 bits data
  • Nil cell
    • 0000 | 60 bits unused
  • Pair cell
    • 0001 | 30 bit heap address (left) | 30 bit heap address (right)
  • Integer cell
    • 0010 | 60 bits integer
  • String cell
    • 0011 | 4 bits unused | 56 bits, 7 characters string
  • Lambda cell
    • 0100 | 32 bit lambda address (left) | 28 bit heap address (lambdas' bound environment, for closures mainly)
  • InstructionPointer and Environment special types are used because CALL and RET instruction save/restore a return address and environment pointer on/from the same stack where the actual data belongs.

main.cc:

Because cell of type string can only contain a maximum of 6 characters (+ trailing zero), there is a limitation on a symbol name length (6 bytes). Of course, it should be possible to mangle/shorten names during compilation or support constant pools in the VM, but it's not done.

Compiles pseduo-lisp code to bytecode

parse_list("(+ 2 (- 3 1))") 

returns Cell object which contains a list of other Cell objects, thus representing tree structure of the code. Cell object can be compiled to a bytecode, using predefined cases for supported special forms:

+-*/%, less, eq, cons, car, cdr, define, func?, str?, int?, null?, begin, cond, lambda and gc

vm.cc:

Either interprets bytecode directly (no -j command argument) or generates x86 native code using libjit (-j command argument). VM class represent a virtual machine with stack, heap and special 'env' pointer register. Sizes of both stack and heap are hard-coded in the beginning of vm.cc. VM class contains 2 functions to execute the code - step_interpret and step_jit. Both are called from VM::run functionb for each instruction. VM::step_interpret function interprets an instruction and returns while step_jit generates a piece of code which upon the end of input should be compiled and executed in VM::run function (after all instruction were consumed). VM class implements simple garbage collection, stop-and-collect, mark-and-sweep algorithm which moves/compacts used cells from one half of the heap to another. Only 3 instructions could lead to heap growth - CONS, DEF and STOREENV, thus both step_interpret and step_jit check if heap pointer is approaching the end of current half of the heap and call VM::gc() automatically. Alternatively it's possible to run gc manually by calling (gc) special form or generating GC instruction.

Usage example:

./main < edigits.lsp | ./vm -j