Miking DPPL is a framework for developing probabilistic programming languages (PPLs) using Miking. Currently, the framework includes the PPLs CorePPL and RootPPL.
The main binary for Miking DPPL is called midppl
. Currently, this binary is built by running make
in the project root, and installed to $HOME/.local/bin
with make install
(uninstall is also possible through make uninstall
). You must have Miking installed (the mi
command must be globally available) for building midppl
.
The command make test
executes a set of tests for various components in the Miking DPPL framework.
CorePPL extends MExpr (see Miking) with probabilistic constructs for defining random variables and for likelihood updating. For example, the program
mexpr
let x = assume (Beta 10.0 5.0) in
observe true (Bernoulli x);
x
encodes a simple distribution for the bias of a coin, by setting a Beta prior for the probability of observing heads (i.e., true
) for a single flip of the coin.
Defining random variables in CorePPL is done by providing a probability distribution to the assume
construct.
Currently, there is no generated documentation for available distributions.
Likelihood updating is done through the weight
and observe
constructs.
With weight
, the logarithm of the likelihood is updated directly (e.g., weight (log 2)
multiplies the likelihood with 2).
With observe
, the likelihood is instead updated with the value of the pmf or pdf for the given distribution at the given observation.
For example observe true (Bernoulli 0.5)
updates the likelihood with a factor of 0.5.
One option to run inference over a CorePPL program is to compile it to RootPPL.
A CorePPL program in a file cpplprog.mc
can be compiled to a RootPPL file out.cu
using the command midppl -m rootppl-smc cpplprog.mc
.
The file out.cu
can then be compiled using, e.g., rootppl out.cu --stack_size 10000
.
The stack size option is mandatory for compiling RootPPL programs compiled from CorePPL.
More information about RootPPL can be found at RootPPL.
RootPPL is an intermediate language for representing probabilistic models and comes with a framework that performs inference on the GPU in these models. These models are currently hard-coded, see examples in the folder rootppl/models
. The idea is that high-level Miking probabilistic programming languages should be compiled to this intermediate language.
The instructions below are tested on Ubuntu 18.04 but should work for other Linux distributions, Windows, and Mac.
Before building RootPPL programs, a C++/CUDA compiler is required. RootPPL works on CPU and Nvidia GPU:s. For the CPU version, a C++ compiler should suffice, e.g. g++. In order to build for GPU, CUDA must be installed. See CUDA installation guides: Linux, Windows, Mac.
To run the CPU version in parallel, OpenMP must be installed. OpenMP comes with recent gcc versions. However, OpenMP is not necessary if one does not want to execute programs in parallel on the CPU.
To install rootppl
for the current user, first clone this repository and change directory to the rootppl
folder. Then run:
make install
This will install rootppl
to $HOME/.local/bin
with resources copied to $HOME/.local/lib/rootppl
. Some systems, e.g. Mac OS, will require manually adding $HOME/.local/bin
to $PATH
.
Note: before building and after installation, especially if you are re-installing,
execute a rootppl clean
, first.
To compile a model and build an executable:
rootppl path/to/model.cu
This will compile the model along with the inference framework for CPU. To compile it for GPU, add your GPU:s compute capability to the arch variable.
You can find your GPU:s compute capability in the Wikipedia table.
Here is an example that will compile the airplane example for a GPU with a minimum compute capability of 7.5
(simply remove --arch 75
to compile for CPU):
rootppl models/airplane/airplane.cu --arch 75 -j 5
The optional argument -j x
speeds up the compilation process by spawning x
jobs, allowing for parallel compilation.
The corresponding parallel CPU (OpenMP) example is:
rootppl models/airplane/airplane.cu --omp -j 5
Alternatively, the c++ compiler can be specified with CXX. This is often required on Mac OS to enable OpenMP, by using g++ instead of the default clang. On Mac OS, g++ can be installed with e.g. brew install gcc
. Then, assuming the version installed was gcc-10
:
rootppl models/airplane/airplane.cu --omp -j 5 --cxx g++-10
The first build will compile the entire inference framework and can take 20 seconds or so when building for GPU. (Run rootppl
with the -j num_threads
to use multiple threads and speed up the build).
Note that if the inference framework is compiled for GPU, and then the model is compiled for CPU,
there will be errors. So always perform a rootppl clean
before switching between CPU and GPU. The same goes for switching to/from OpenMP.
This should generate an executable named program
(add -o <exec_name>
to name it differently). Execute it with ./program num_particles
. For example:
./program 1000
An example output of this:
./program 1000
-119.422143
Num particles close to target: 98.9%, MinX: 56.2349, MaxX: 91.1572
First is the command that is executed, it executes the executable program
with program argument 1000
.
The first row is the normalizing constant. The second row is a print statement within the model.
You can supply a second argument to program
, which is the number of sweeps:
./program 1000 3
-115.482724
Num particles close to target: 89.7%, MinX: 51.5005, MaxX: 88.2039
-115.472101
Num particles close to target: 90.5%, MinX: 53.0577, MaxX: 90.0298
-115.496258
Num particles close to target: 88.7%, MinX: 48.1773, MaxX: 89.0957
Models are divided into fragments to enable pausing the execution within models.
These fragments are functions referred to as basic blocks (BBLOCK
).
To control the program execution flow, a program counter (PC
) can be modified.
If it remains unchanged in when the basic block returns, resampling will be done, and then
the same block will be executed again. The program counter corresponds to the index of the basic block
to be executed. So, incrementing it means that the next block will be executed after resampling. An
example of this can be seen in the example below. However, if no following blocks are defined,
the inference will terminate as the model program has been executed.
Any interesting model will contain random choices, i.e. sampling from distributions.
Below is a coin flip example which flips a biased coin. SAMPLE
is a macro that can take variable number of arguments.
First is the distribution name, followed by the distribution specific arguments.
Sampling is done differently on the CPU and the GPU,but these differences are hidden in the model with the help
of this SAMPLE
macro. All of these RootPPL keywords are macros similar to SAMPLE
, they provide higher-level
constructs that can be used in the same way regardless of if the model will be executed on the CPU or the GPU.
BBLOCK(coinFlip, {
int coinSample = SAMPLE(bernoulli, 0.6);
PC++;
})
In the coin flip example above, there is one problem, however. The sample is only stored in a local
variable and is never used. To store data that remains when the next block is executed, the program
state (PSTATE
) should be used. Before defining the blocks, the model must be
initialized with the macro INIT_MODEL
that takes two arguments. First the type of the program
state (this could be any type, e.g. int
or a structure), then the number of basic blocks in
the program. So, adding it to the above example:
INIT_MODEL(int, 1)
BBLOCK(coinFlip, {
PSTATE = SAMPLE(bernoulli, 0.6);
PC++;
})
Now to run this program, only one thing remains to be defined, the main function.
This is done with the MAIN
macro, taking a code block as argument.
The code below shows what we need in order to run the program and perform SMC.
MAIN({
ADD_BBLOCK(coinFlip);
SMC(NULL);
})
First, the block must be added to the array of blocks to be executed. The order in which
blocks are added with ADD_BLOCK
, defines their order in the array and thus
defines the order of execution together with the program counter. Secondly, the
SMC
macro (its parameter is explained below) starts the inference.
Now the model can be compiled and executed! However, the result of the coin flips is
stored, but never used. To aggregate the results of all the particles' samples,
a callback function (CALLBACK
) can be used. The callback will be called after
inference, but before clean up of the particles. This way, the results can be used to
generate desired distributions or values before particles are deleted.
Within the CALLBACK
macro, the array of program states is accessed
with PSTATES
and the number of particles is accessed with the
parameter N
(which is hidden within the macro). Also, the log-weights array can be accessed with
the WEIGHTS
macro. These weights are normalised (so that the exponent of these log-weights sum to 1).
Here is an example callback, called "sampleMean" that calculates and prints the mean
of the samples.
CALLBACK(sampleMean, {
double sum = 0;
for(int i = 0; i < N; i++)
sum += PSTATES[i];
double mean = sum / N;
printf("Sample mean: %f\n", mean);
})
For this to be used, the SMC
macro call in main must be
changed to: SMC(sampleMean);
This example can be found in
rootppl/models/simple-examples/coin_flip_mean.cu
and, being in the rootppl
directory, compiled with:
rootppl models/simple-examples/coin_flip_mean.cu
Then it can be executed with the executable followed by the
number of particles, for example: ./program 1000
.
An example output is then:
Sample mean: 0.608000
log normalization constant = 0.000000
First we see our callback function's output. Then on the next line, is the logarithm of the normalization constant approximated by the inference. This is simply 0 here, since the model contains no statements that alter the weights of the particles.
Below follows some examples, these are all models that are defined within one single block.
Full example: rootppl/models/simple-examples/coin_flip.cu
In this model, a bias for a coin is sampled from the prior beta distribution. Then we observe that the coin flip is true. This model thus infers the posterior distribution of the bias, conditioned on the observation.
BBLOCK(coinFlip, {
double x = SAMPLE(beta, 2, 2);
OBSERVE(bernoulli, x, true);
PSTATE = x;
PC++;
})
Full example: rootppl/models/simple-examples/mixture.cu
This model demonstrates an example of stochastic branching, meaning that different code is executed depending on the outcome of the sample.
BBLOCK(mixture, {
double x;
if(SAMPLE(bernoulli, 0.7))
x = SAMPLE(normal, -2, 1);
else
x = SAMPLE(normal, 3, 1);
PSTATE = x;
PC++;
})
Full example: rootppl/models/simple-examples/geometric_recursive.cu
This model combines stochastic branching with recursion. Basic blocks do not fully support recursion themselves, as they take no custom arguments or return values. Instead, a helper function is used to express the recursive model:
BBLOCK_HELPER(geometricRecursive, {
if(SAMPLE(bernoulli, p))
return 1;
else
return BBLOCK_CALL(geometricRecursive, p) + 1;
}, int, double p)
BBLOCK(geometric, {
PSTATE = BBLOCK_CALL(geometricRecursive, 0.6);
PC++;
})
Note that the helper function takes its return value and parameters comma-separated after the function body.
While recursive functions is supported by CUDA, iterative solutions are encouraged. Below is the same model, implemented with a loop instead.
Full example: rootppl/models/simple-examples/geometric_iterative.cu
BBLOCK(geometric, {
int numFlips = 1;
while(! SAMPLE(bernoulli, 0.6))
numFlips++;
PSTATE = numFlips;
PC++;
})
More sophisticated models can be found in the phylogenetics directory. These probabilistic models to inference on observed phylogenetic trees. These observed trees can be found in phylogenetics/tree-utils/trees.cuh. The correct phylogenetic models contain a link in the top of the file to the WebPPL source code used as reference when implementing them. These models contain a number of new things, e.g.:
- Multiple basic blocks
- Structures as program states
- Recursive simulations
- Global data used by the model
- Resampling throughout the traversal of the observed tree
In phylogenetics/crbd, the constant-rate birth-death models can be found. The most interesting file here is crbd.cu as it is a correct model used by evolutionary biologists. This model uses a pre-processed DFS traversal path over the observed tree, rather than using a call stack.
Further phylogenetic models are found in the package
phyrppl
.
The architecture's main goal is to decouple the inference from the probabilistic models, this is the case for probabilistic
programming languages. In the inference
directory, the code for inference methods can be found. Currently, only SMC is supported.
Within the smc
directory, the resampling
directory contain resampling strategies. Currently only Systematic Resampling is supported. The resampling has
a parallel implementation for the GPU, and a sequential and parallel implementation for the CPU.
- The
dists
directory contains distributions and scoring functions. Both models and inference can rely on these. - The
macros
directory contains all the macros/constructs that are mostly used by models but also in inference code. - The
utils
directory contains code that RootPPL uses, but can be used by models as well. - The
models
directory contains example models that use the inference.
To compile with the progStateStack_t
program state, the macro INIT_MODEL_STACK(num_bblocks)
should be used instead of INIT_MODEL(progState_t, num_bblocks)
.
Then the stack size is specified in bytes with --stack_size num_bytes
, e.g. rootppl my_stack_model.cu --stack_size 10000
.
TODO, stuff not yet demonstrated explicitly in README:
- WEIGHT macro
- Multiple BBLOCK models
- Global data accessible by bblocks on GPU
- Program States containing more than a primitive datatype, e.g. structs.
7/7/2021: added BetaBernoulli distribution