nv-tlabs/LION

Why the reconstruction is so similar to the ground truth even in the early training stage in VAE?

OswaldoBornemann opened this issue · 5 comments

Why the reconstruction is so similar to the ground truth even in the early training stage in VAE?

@ZENGXH I know that you referred to something similar in issue here. So, I am very curious about how you evaluate the reconstruction performance in your paper appendix, which is shown in Table 23 and Table 24. Will you add some noise to it? Otherwise, I think that the EMD and CD values will be much lower.

@ZENGXH I also would like to ask another question. It seems that the input_pts in pointflow_datasets.py is the same as the tr_out. So when you define input_pts as noisy_input, it just actually the same as the val_x? I am not sure if am I right.

val_x, it=step, is_eval_nll=1, noisy_input=inputs, **model_kwargs)

@ZENGXH What is the epoch of VAE that you used in the autoencoding experiment? Is that trained VAE be used for the diffusion training?

ZENGXH commented

valuate the reconstruction performance in your paper appendix, which is shown in Table 23 and Table 24.

when we evaluate the reconstruction (we evaluate the last vae ckpt), we sample from the posterior, meaning that it's sampled from N(network_output_mu, network_output_logsigma)

define input_pts as noisy_input,

yes. the noisy_input in regular vae training is the same the input; I name it as noisy because we use different input points in exp Encoder Fine-tuning for Voxel-Conditioned Synthesis and Denoising under paper section 3.1

epoch of vae

I use epoch 8000 for vae training, the last checkpoint is used for diffusion training

Thank you very much.