Question around number of denoising steps
agneet42 opened this issue · 2 comments
agneet42 commented
Hi @wl-zhao , if I understood correctly, for each RGB image, you perform de-noising conditioned on the text for 1 timestamp, as shown here - https://github.com/wl-zhao/VPD/blob/main/depth/models_depth/model.py#L100 .
Is that correct? If yes, why did you choose timestamp = 1, did you also explore more rounds of denoising and did that help?
wl-zhao commented
Hi, we use timestep=1 to reduce the stochasticity of the inference. We have tested in vpd_seg that using a larger denoising step will harm the performance. However, I think a proper timestep might serve as an augmentation and thus could be beneficial in some tasks. You can have a try and I am eager to hear your feedback:-)