- On Amazon EC2's FREE MACHINE (512M memory, 1 core). We can sample 3.6M variables/seconds.
- On a 2-node Amazon EC2 machine, sampling 7 billion random variables, each of which has 10 features, takes 3 minutes. This means we can run inference for all living human beings on this planet with $15 (100 samples!)
- On Macbook, DimmWitted runs 10x faster than DeepDive's default sampler.
See: DimmWitted sampler page in DeepDive's documentation.
The binary format for DimmWitted's input is documented in doc/binary_format.md.
First, install build dependencies:
make -j dep
Then, build:
make -j
A modern C++ compiler is required: g++ >= 4.8 or clang++ >= 4.2.
To specify the compiler to use, set the CXX
variable:
CXX=/dfs/rulk/0/czhang/software/gcc/bin/g++ make
To test, run:
make -j test
- Follow Google C++ Style Guide.
- Travis CI tests will error unless you run
make format
before git commits. - Tests are written with gtest and bats.
- Command-line parsing is done with TCLAP.
- NUMA control is done with libnuma.
C. Zhang and C. Ré. DimmWitted: A study of main-memory statistical analytics. PVLDB, 2014.