Question around number of denoising steps

Question

Question around number of denoising steps

agneet42 opened this issue a year ago · 2 comments

Hi @wl-zhao , if I understood correctly, for each RGB image, you perform de-noising conditioned on the text for 1 timestamp, as shown here - https://github.com/wl-zhao/VPD/blob/main/depth/models_depth/model.py#L100 .
Is that correct? If yes, why did you choose timestamp = 1, did you also explore more rounds of denoising and did that help?

Answer 1 · 2023-08-02T06:24:12.000Z

Hi, we use timestep=1 to reduce the stochasticity of the inference. We have tested in vpd_seg that using a larger denoising step will harm the performance. However, I think a proper timestep might serve as an augmentation and thus could be beneficial in some tasks. You can have a try and I am eager to hear your feedback:-)

Answer 2 · 2023-08-02T16:10:39.000Z

Thanks @wl-zhao !