This repository contains the source code and additional results for the experiments described in
http://arxiv.org/abs/1506.03877
The basic idea is to create a deep generative model for unsupervised learning by combining a top-down directed model P and a bottom up directed model Q into a joint model P*. We show that we can train P* such that P and Q are useful approximate inference distributions when we want to sample from the model, or when we want to perform inference.
We generally observe that BiHMs prefer deep architectures with many layers of latent variables. I.e., our best model for the binarized MNIST dataset has 12 layers with 300,200,100,75,50,35,30,25,20,15,10,10 binary latent units. This model reaches a test set LL of 84.8 nats.
The left image shows 100 random samples from the top-down model P; the right image shows that starting from this point and running 250 Gibbs MCMC steps to approximately sample from P* results in higher quality, crisp digits. (we visualize the Bernoulli probability per pixel instead of sampling from it)
The left image shows 10 different digits that have been partially occluded. For each digit, we sample 10 different starting configurations from Q and subsequently run a Markov chain that produces approx. samples from P* which are consistent with the initial digits.
This code depends on Fuel, Theano, Blocks and various other libraries from the scientific python universe.