C++ scripts for reproducing the numerical results of the paper "The semi-hierarchical Dirichlet Process and its application to clustering homogeneous distributions" (Bayesian Analysis, 2021)
- Clone this repo with its submodules
git clone --recurse-submodules https://github.com/mberaha/semihdp-scripts.git
- Build the executable
mkdir build
cd build
cmake ..
make run_from_file
From the root of the directory you can call
./build/run_from_file \
DATASET_FILE.csv \
semihdp_params.asciipb \
CHAINS_FILE.recordio \
LATENT_VARS_FILE.csv \
DENSITY_GRID.csv \
PATH_TO_OUTPUT_DENSITIES
where
DATASET_FILE.csv
is the path to a csv file with two columns: the group id and the observation (no header)semihdp_params.asciipb
contains all the prior hyperparameters, it is in the root folder of the repoCHAINS_FILE.recordio
is where the MCMC chains will be stored (as a sequence of serialized protocol buffers)LATENT_VARS_FILE.csv
is where the latent variables associated to each observation will be saved, useful to identify the clusters. This will be a csv file with four columns: [iteration_number, group_id, mean, var]DENSITY_GRID.csv
is a csv file with the grid over which to evaluate the (log) density. This is common for all the groupsPATH_TO_OUTPUT_DENSITIES
is a path where one csv file for each group will be created. Each file will store the mixture density evaluated on the grid at each iteration of the MCMC chain (by row).
For instance, to run the semihdp on the dataset in example/data.csv
and evaluating the density on the grid example/xgrid.csv
:
./build/run_from_file \
example/data.csv \
semihdp_params.asciipb \
example/chains.recordio \
example/latent_vars.csv \
example/xgrid.csv \
example/dens
You can use the semihdp.py
file for a simple Python interface that hides the terminal from you.
You can move the file anywhere on your machine as long as you modify the definition of
SEMIHDP_HOME_DIR
to point to the right directory (at the moment, it points to the same path where
the file semidhp.py
is, if you move the file, it should point to the path where this README is).