Rayhane-mamah/Efficient-VDVAE

pytorch generation 1024 celeba_hq

miaoYuanyuan opened this issue · 2 comments

I check the download log, the train time reconstruction image is very good, but why i generate is very poor.

Hello @miaoYuanyuan and thank you for your interest to our work :)

To make sure I understand your question correctly:

  • You downloaded the logs of CelebAHQ 1024 and looked at the reconstructed images in tensorboard and they looked good (almost perfect reconstruction)
  • Then you tried to generate new images from the latent space using python synthesize.py in "generation" mode, and the images looked bad.

Reconstructed samples from the posterior $q_{\phi}(z|x)$ should always be quasi-perfect for a VAE (source).

That is totally normal for this model (check page 24 of our paper). We expect this issue to come down to the fact that for the higher resolution of 1024x1024, the model is not deep/complex enough to capture the data distribution well. As a result, sampling from the prior distribution $p_{\theta}(z)$ during inference doesn't give good samples.

For this model (Efficient-VDVAE), if you want to have decent samples generated by random sampling from the prior, we suggest using the lower version datasets (for example CelebAHQ 256).

Hope this answers the question! :)
Rayhane.

Thank you for your reply ! @Rayhane-mamah
I generate for the resolution of 256x256, that's good.