One Bit VAE

A simple demonstration of the bits-back encoding mechanism by considering an extreme example where the data consists of only two distinct samples. The true entropy of the dataset is thus ~0.693 nats.

By varying the number of dimensions in the sample while keeping the dimension-independence of the decoder, we can easily demonstrate that Z strongly resists encoding X when the number of dimensions is small. This is because the smaller the number of dimensions, the less important it is for Z to encode global information about X. When ndims=1, it is possible to explain the data with the decoder alone, and Z is never used. Failure to encode in Z is reflected in the poor reconstruction cost for ndims=1, despite also having the lowest loss (decoder is capable of explaining the data all by itself).

As ndims increase, there is increasing pressure to encode into Z in order to coordinate the independent dimensions of the decoder. The VAE must now perform reconstruction near-perfectly while simultaneously minimizing the KL divergence between its encoding distribution and the unit Gaussian prior. The 2-layer VAE is more successful at doing so by expanding the variational family of the top layer, which enables its encoding distribution to better match the unit Gaussian prior.

Looking at the case of ndims=1000, we see very clearly the importance of expanding the variational family. The simplest generative process is:

When z < 0, generate x = 1
When z >= 0, generate x = 0 (or vice versa)

This means the optimal posterior p(z|x) is a truncated Gaussian. If we are limited to the Gaussian family for q(z|x), we can never meaningfully minimize the KL(q(z|x) || p(z|x)).

A 2-layer VAE has a Gaussian q(z2|z1). However, q(z2|x) is potentially arbitrarily complex, enabling better approximation of the truncated Gaussian.

RuiShu/one-bit-vae

One Bit VAE