princeton-vl/RAFT-Stereo

application of RAFT-stereo for selfsupervised learning

HaiLiExp opened this issue · 0 comments

Hello

I work on self-supervised learning for depth estimation. The only difference to supervised learning is the more complicated calculation of the loss function: instead of a comparison with ground truth, I do backprojection of the image pixels into 3D space and projection again into the other camera. I tried mobilestereonet, it works fine, but RAFT-stereo learns nothing if the training begins from scratch.

Not sure whether you have read my former text. I found out that the finetuning should have a much lower learning rate than you proposed. I can get qualitatively reasonable results. The question now is how to make the training from scratch.

One difference to RAFT-stereo is that the training loss is not a summation, but an average of the "error map". Is that a problem?