This project aims to implement an extensible stack based interpreter toolkit in C++17.
#include <cassert>
#include "lgpp/ops/push.hpp"
#include "lgpp/ops/stop.hpp"
#include "lgpp/types.hpp"
#include "lgpp/vm.hpp"
using namespace lgpp;
int main() {
VM vm;
Stack& s = get_stack(vm);
emit<ops::Push>(vm, vm.Int, 42);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(pop(s, vm.Int) == 42);
return 0;
}
Custom interpreters are powerful and flexible tools that allow solving thorny problems with grace. Configurations, querys, templates, formats, formulas and scripts are some problem classes that interpreters excel in.
The general idea is that distilling the fundamental building blocks in library form makes it possible to reduce needed effort to the point where more problems start to look like custom languages, and where it's affordable to try out new ideas and throw some away.
The project requires a C++17 compiler and CMake to build.
$ git clone https://github.com/codr7/liblgpp.git
$ cd liblgpp
$ mkdir build
$ cd build
$ cmake ..
$ make test
$ ./test
Single threaded performance is around 3 times slower than Python3. While I believe that's possible to improve significantly within the constraints imposed by the current design, in the end computed goto is going to outperform function calls for the eval loop.
$ cd build
$ make bench
$ ./bench
fibrec: 645432us
coro: 384625us
$ cd bench
$ python3 fibrec.py
238096us
$ python3 coro.py
159311us
Modern C++ is a fairly complex language, but it is unique in the way it allows dialling in just the right balance between abstraction, performance and safety. I've tried my best to keep the code as simple as straight forward as the language allows, using the standard library where appropriate without getting tangled up.
One somewhat unusual aspect of the design is that it doesn't use inheritance for polymorphism, this allows more flexibility and simplifies usage.
The eval loop trades speed for simplicity, extensibility, portability and maintainability by dispatching to functions rather than using computed goto and representing operations as structs rather than packed bytecode.
Stacks are passed by value. The following set of operations is provided for stack manipulation.
Copies a number of values (default 1) from the stack starting at an offset from the end (default 0).
emit<ops::Push>(vm, vm.Int, 1);
emit<ops::Push>(vm, vm.Int, 2);
emit<ops::Push>(vm, vm.Int, 3);
emit<ops::Cp>(vm, 2, 2);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 5);
assert(pop(s, vm.Int) == 2);
assert(pop(s, vm.Int) == 1);
assert(pop(s, vm.Int) == 3);
assert(pop(s, vm.Int) == 2);
assert(pop(s, vm.Int) == 1);
Drops a number of values (default 1) from the stack starting at an offset from the end (default 0).
emit<ops::Push>(vm, vm.Int, 1);
emit<ops::Push>(vm, vm.Int, 2);
emit<ops::Push>(vm, vm.Int, 3);
emit<ops::Drop>(vm, 1, 2);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 1);
assert(pop(s, vm.Int) == 1);
Swaps the value on the stack at one offset from the end (default 0) with another (default 1).
emit<ops::Push>(vm, vm.Int, 1);
emit<ops::Push>(vm, vm.Int, 2);
emit<ops::Swap>(vm);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 2);
assert(pop(s, vm.Int) == 1);
assert(pop(s, vm.Int) == 2);
Replaces the top stack value with its contents.
Stack v{{vm.Int, 1}, {vm.Int, 2}, {vm.Int, 3}};
emit<ops::Push>(vm, vm.Stack, v);
emit<ops::Splat>(vm);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 3);
assert(pop(s, vm.Int) == 3);
assert(pop(s, vm.Int) == 2);
assert(pop(s, vm.Int) == 1);
Replaces the current stack with a single value holding its contents.
emit<ops::Push>(vm, vm.Int, 1);
emit<ops::Push>(vm, vm.Int, 2);
emit<ops::Push>(vm, vm.Int, 3);
emit<ops::Squash>(vm);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 1);
s = pop(s, vm.Stack);
assert(s.size() == 3);
assert(pop(s, vm.Int) == 3);
assert(pop(s, vm.Int) == 2);
assert(pop(s, vm.Int) == 1);
Types are passed by reference. The provided type hierarchy contains the following types, but new ones are trivial to add:
Represents nothing and has one value of type nullptr_t
.
The type of all types with values of type Trait *
.
The type of all numbers.
The type of all sequences.
Plain old ints.
Pairs are implemented as pair<Val, Val>
and passed by value.
emit<ops::Push>(vm, vm.Int, 1);
emit<ops::Push>(vm, vm.Int, 2);
emit<ops::Zip>(vm);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 1);
auto v = pop(s);
assert(&type_of(v) == &vm.Pair);
auto p = v.as(vm.Pair);
assert(p.first.as(vm.Int) == 1 && p.second.as(vm.Int) == 2);
Pair p({vm.Int, 1}, {vm.Int, 2});
emit<ops::Push>(vm, vm.Pair, p);
emit<ops::Unzip>(vm);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 2);
assert(pop(s, vm.Int) == 2);
assert(pop(s, vm.Int) == 1);
Stacks are implemented as vector<Val>
.
Coroutines (see below).
Threads (see below).
TypeOf
replaces the top stack value with its type.
emit<ops::Push>(vm, vm.Int, 1);
emit<ops::TypeOf>(vm);
emit<ops::Cp>(vm);
emit<ops::TypeOf>(vm);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 2);
assert(pop(s, vm.Meta) == &vm.Meta);
assert(pop(s, vm.Meta) == &vm.Int);
Isa
replaces the top two stack values with T
if the first argument is derived from the second, F
otherwise.
emit<ops::Push>(vm, vm.Meta, &vm.Int);
emit<ops::Push>(vm, vm.Meta, &vm.Num);
emit<ops::Isa>(vm);
emit<ops::Stop>(vm);
eval(vm, 0);
assert(s.size() == 1);
assert(pop(s, vm.Bool));
Coroutines are labels that support pausing/resuming calls. Since they are passed by value, copying results in a separate call chain starting at the same position.
Label target("target", emit_pc(vm));
emit<ops::Push>(vm, vm.Int, 1);
emit<ops::Push>(vm, vm.Int, 2);
emit<ops::Pause>(vm);
emit<ops::Push>(vm, vm.Int, 3);
emit<ops::Ret>(vm);
auto start_pc = emit_pc(vm);
emit<ops::StartCoro>(vm, target);
emit<ops::Resume>(vm);
emit<ops::Resume>(vm);
emit<ops::Drop>(vm);
emit<ops::Stop>(vm);
eval(vm, start_pc);
assert(pop(s, vm.Int) == 3);
assert(pop(s, vm.Int) == 2);
assert(pop(s, vm.Int) == 1);
The VM supports preemptive multithreading and has been carefully designed to minimize locking. Each thread runs in complete isolation on its own stack, which is finally pushed to the calling stack on join.
Label target("target", emit_pc(vm));
emit<ops::Push>(vm, vm.Int, 1000);
emit<ops::Sleep>(vm);
emit<ops::Stop>(vm);
auto start_pc = emit_pc(vm);
emit<ops::StartThread>(vm, target);
emit<ops::Join>(vm);
emit<ops::Stop>(vm);
eval(vm, start_pc);
assert(pop(s, vm.Stack).size() == 0);