wl-zhao/VPD

Question around number of denoising steps

agneet42 opened this issue · 2 comments

Hi @wl-zhao , if I understood correctly, for each RGB image, you perform de-noising conditioned on the text for 1 timestamp, as shown here - https://github.com/wl-zhao/VPD/blob/main/depth/models_depth/model.py#L100 .
Is that correct? If yes, why did you choose timestamp = 1, did you also explore more rounds of denoising and did that help?

Hi, we use timestep=1 to reduce the stochasticity of the inference. We have tested in vpd_seg that using a larger denoising step will harm the performance. However, I think a proper timestep might serve as an augmentation and thus could be beneficial in some tasks. You can have a try and I am eager to hear your feedback:-)

Thanks @wl-zhao !