CompVis/depth-fm

why is NFE=1 for marigold pure noise

Opened this issue · 4 comments

w-hc commented

Hi thanks for the inspiring work.
In fig 6, for marigold NFE=1, the result is pure noise. That seems counter-intuitive. At NFE=1, we should just get the conditional mean of the prediction i.e. x0 hat. It may be blurry, but it's hard to see why it should be pure noise.

I think NFE=1 is just another way of saying "1 denoising step"

Hi w-hc, as Fannovel16 pointed out NFE=1 means predicting the depth within one single step. Marigold uses DDIM sampler that approximates the diffusion SDE with an ODE, and fewer inference steps results in increased ODE approximation error. This basically always leads to generation of noises or images that are noised. Please refer to more details in DDIM and DPM-Solver.

I believe diffuser's DDIM solver is not intended to be used with NFE=1. In that case the diffuser implementation uses timestep t=1 and the model will basically do nothing to the image. But I think the correct way to do it is using t=999 for one step denoising.

w-hc commented

second Jiahao. The NFE=1 result for marigold should be much better.