duanyiqun/DiffusionDepth

Unstable Training without L2 loss

Closed this issue · 4 comments

unlugi commented

Dear author,
Thank you so much for this amazing repo!

I have question regarding training instability with certain loss combinations.

I trained two models:
(1) 1.0L1+1.0L2+1.0DDIM (baseline)
(2) 1.0
L1+0.0L2+1.0DDIM (no L2)

Training loss graphs look like this: PINK/Baseline --- GREY(NoL2)
image

Have you ever encountered such an instability in your experiments? Do you know why this might be happening?
The model without L2 diverges - the quality of reconstructions go really bad after epoch 5 randomly and never recover during training. The reason I don't want L2 is because I think it is producing smooth depth maps and I need sharp reconstructions - L1 might be good for sharpness. FYI I am working on a different dataset than you - like satellite images.

Hi Unlugi,
Thank you very much for your interest. I'm also curious about this phenomenon.
I've not encountered similar problem during previous experiments. However I think it might be two aspects of issue.

It might be the way of adding noise. Since most depth maps are sparse, this version is adding noise to the predicted refine-depth map itself, instead of GT map. If the remote satellite maps are dense (like images). I think switch to add noise to GT might help.
Only using L1 might be more sensitive to the parameter scale and sign. I don't know exactly but I'm suspecting to add a monotonic NN on the output layer might help stablize L1, since it is always positive (none-decreasing output distribution)

unlugi commented

Hi author,

Thanks for your comments, I highly appreciate it.

Satellite depth maps are dense, all pixels have depth values - I could try diffusing on the GT, instead of refined depth prediction from the previous timestep. However, I also heard that using the previous step prediction in the current step helps with domain gap during testing when you have GT.

For L1 instability, yeah model doesn't do well with only L1- I was using lr=5e-4 with L1, very unstable compared to L1+L2.
lr=1e-4 with L1 trains without instability but results are not good compared to L1+L2. L2 seems to be providing stable training signals. I thought I could get sharper reconstructions with L1 only but because it doesn't train well, even the L2 model beats it in sharpness.

For monotonic NN, do you mean something like a layer with all positive weights? Constraining the layer to have only positive weights?

Thanks!

Hi there,

For monotonic NN, yes, I do mean none-decreasing function with all positive weights and Relu. I was thinking the L1 might be sensitive to sign variance say (-1 or 1). Give a strict output range might help. L1 is could be think as comparison on the Euclidean space, none-decreasing might help it comparable with the value scale.

Best regards

unlugi commented

Thank you!