The latent channel (4,16,16) in paper is not same with that in the code (64,16,16)

Question

The latent channel (4,16,16) in paper is not same with that in the code (64,16,16)

xiaochengfuhuo opened this issue 9 months ago · 4 comments

the latent_channels in scripts/vae/sevirlr/cfg.yaml is 64 but the latent_channels in the paper's Implementation Details is 4.
Will it reduce the training time during the denoise when the latent_channels is set as 64? Because the original data is 128 * 128 = 64 * 16 * 16.

Answer 1 · 2023-12-05T06:18:11.000Z

Thank you for your question. We followed the default config in LDM to use latent_channels = 4 and reported the performance in our paper. We found that setting latent_channels = 64 gives a more robust model that is less sensitive to the optimization hyperparameters. In this repo, we set latent_channels = 64 for convenience to reproduce our results. We also release the corresponding pretrained weights for consistency.

Answer 2 · 2023-12-05T11:09:15.000Z

Thank you for your reply. It is helpful for me. How about the training and sampling time? Will latent_channels = 64 be much longer than latent_channels = 4?

Answer 3 · 2023-12-06T12:48:19.000Z

Thank you for your follow-up question. The computational costs are not bottlenecked by the hyperparameter latent_channels. In our VAE model, the channel dimensions always increase to 512, regardless of the value of latent_channels. Similarly, in our Earthformer-UNet always increase to 256 and 512. As such, the choices of latent_channels do not significantly impact the training time or inference time. According to our experiments, changing latent_channels to 64 instead of 4 did not even lead to a doubling of the computational costs.

Answer 4 · 2023-12-06T15:46:13.000Z

Thanks a lot. I got it.