Wrong generative process
gordon-lim opened this issue · 5 comments
Firstly, thank you for the tutorial!
For your generative process, you are "Generating image from noise vector" even though you have already learnt the latent space. This is actually the problem with vanilla autoencoders. The latent space is not learnt so when you just randomly pick a latent vector, chances are it is not within the latent space/one that the decoder doesn't know how to work with.
To do it right, you should 1) sample an image 2) make a forward pass with that image to get a mean and covariance 3) use aforementioned mean and covariance to sample a latent vector
Mathemathically speaking, (hopefully the notation is universal) you are sampling x and using the probabilistic encoder p(z|x) where the given x is the one you just sampled to sample a latent variable. Then that's the latent variable within the latent space which the decoder knows how to work with. And of course, the generated image won't be from the training dataset.
I figured you may have already figured something was wrong considering your generated results so I hope this clear things up.
with torch.no_grad():
x, _ = next(iter(train_loader))
x = x[0]
sampled_x = x.view(1, x_dim)
_, mean, log_var = model.forward(sampled_x)
var = torch.exp(0.5*log_var)
epsilon = torch.rand_like(var)
sampled_z = mean + var*epsilon
generated_images = decoder(sampled_z)
This is some code I used with your notebook to do it the proper way. Hope it makes sense. Thanks again for the tutorial.
May I ask why both of you are using torch.rand_like
, which uses a uniform distribution?
May I ask why both of you are using
torch.rand_like
, which uses a uniform distribution?
oops i thought it was a normal distribution. i didn't catch that. thanks!
Dear @gordon-lim ,
First of all, thank you for leaving this issue.
Yes, right. Strictly speaking, generating images from noise vectors is not generative process of Variational AutoEncoder (VAE).
However, I wanted to show we can even generate image from noise vector (z ~ N(0, I)) (though it is not generative process of VAE) in this tutorial.
In other words, even if we don't know exact p(z|x), we can generate images from noise, since the loss function of training VAE regulates the q(z|x) (simple and tractable posteriors) must close enough to N(0, I). If q(z|x) is close to N(0, I) "enough"(but not tightly close due to posterior collapse problem), N(0, I) may replace the encoder of VAE.
So I wanted to show what I described, and included the wrong generation process. However, I have just realized that someone could got in confusion due to my naive explanation. I will fix my codes as what you suggested.
Again, thank you for your suggestion and comments about this tutorial.
Also, sorry for confussion.
I will edit and explain about the generation process in Jupyter kernel within few days.
Dear @vnmabus ,
We must use Normal distribution.
I didn't catch it since torch.randn_like
and torch.rand_like
are almost same.
Thank you for pointing a severe mistake.
I see. I did suspect it might have been intentional because you otherwise seem to fully understand VAEs. Thank you for the clarification and I look forward to the update.