1. Pre-trained Parameters
Trained on MNIST for 84 epochs (vae_mnist.pth )
seed=888, recon_weight=600, lr=0.0005, batch_size=64
val_recon_loss=0.1085, val_kld_loss=7.3032
# e.g.,
python3 vis/encoder_output/main.py\
--seed=888\ # Optional
--batch_size=64\ # Optional
--taget=" mean" \ # Or `"std"`
--model_params=" /.../datasets/vae/vae_mnist.pth" \
--data_dir=" /.../datasets" \
--save_dir=" /.../workspace/VAE/vis/encoder_output"
Mean and STD of MNIST Test Set
평균의 경우 4와 9, 3과 5가 많이 겹쳐 있습니다.
표준편차의 경우 1에 가까워지도록 학습이 이루어졌으나 0에 가까운 값을 띄고 있습니다. 시각화를 통해 얻을 수 있는 인사이트는 크게 없는 것으로 보입니다.
python3 vis/decoder_output/main.py\
--seed=888\ # Optional
--latent_min=-4\ # Optional
--latent_max=-4\ # Optional
--n_cells=32\ # Optional
--model_params=" /.../datasets/vae/vae_mnist.pth" \
--data_dir=" /.../datasets" \
--save_dir=" /.../workspace/VAE/vis/encoder_output"
latent_min=-4, latent_max=4, n_cells=32
Encoder output의 평균의 분포와 매우 유사함을 볼 수 있습니다.
python3 vis/reconstruct/main.py\
--seed=888\ # Optional
--batch_size=128\ # Optional
--model_params=" /.../datasets/vae/vae_mnist.pth" \
--data_dir=" /.../datasets" \
--save_dir=" /.../workspace/VAE/vis/encoder_output"
3. Theoretical Background
$$P(A \vert B) = \frac{P(B \vert A)P(A)}{P(B)}$$
$P(A \vert B)$ is a conditional probability or posterior probability of $A$ given $B$ .
$P(A)$ and $P(B)$ are known as the prior probability and marginal probability.
$$P(A \vert B) = \frac{P(B \vert A)P(A)}{P(B)}, \text{ if } P(B) \neq 0$$
2) ELBO (Evidence Lower BOund)
$$\int q_{\phi}(z \vert x)dz = 1$$
$$
\begin{align}
\ln(P(x))
&= \int \ln(P(x))q_{\phi}(z \vert x)dz\\
&= \int \ln \bigg(\frac{P(z, x)}{P(z \vert x)}\bigg)q_{\phi}(z \vert x)dz\\
&= \int \ln \bigg(\frac{P(z, x)}{q_{\phi}(z \vert x)}\frac{q_{\phi}(z \vert x)}{P(z \vert x)}\bigg)q_{\phi}(z \vert x)dz\\
&= \int \ln \bigg(\frac{P(z, x)}{q_{\phi}(z \vert x)}\bigg)q_{\phi}(z \vert x)dz + \int \ln \bigg(\frac{q_{\phi}(z \vert x)}{P(z \vert x)}\bigg)q_{\phi}(z \vert x)dz\\
\end{align}
$$
A basic result in variational inference is that latent_minimizing the KL-divergence is equivalent to latent_maximizing the log-likelihood [2].
$$
\begin{align}
\text{ELBO}
&= \int \ln \bigg(\frac{P(z, x)}{q_{\phi}(z \vert x)}\bigg)q_{\phi}(z \vert x)dz\
&= \int \ln \bigg(\frac{P(x \vert z)P(z)}{q_{\phi}(z \vert x)}\bigg)q_{\phi}(z \vert x)dz\
&= \int \ln \big(P(x \vert z)\big)q_{\phi}(z \vert x)dz + \int \ln \bigg(\frac{P(z)}{q_{\phi}(z \vert x)}\bigg)q_{\phi}(z \vert x)dz\
\end{align}
$$