princeton-vl/RAFT-Stereo

Question about fine-tuning on Middlebury2014

kongdebug opened this issue · 5 comments

Hi @lahavlipson,
Thank you for your great work!

I am now using the weights trained on the scenefrlow dataset to fine-tune on the middlebury dataset. After I finish the fine-tuning, the results on the D1 of the full middlebury dataset are even worse than before. Is this normal?

The raftstereo-sceneflow.pth result is consistent with Table 1 of the paper:
image

However, the result after fine tuning on the Middlebury2014 dataset are relatively poor:
image

I've found that the performance is better and more stable if the learning rate is small, e.g. --lr 0.00002, similar to what we use for KITTI; I've updated the command in the README.

I've found that the performance is better and more stable if the learning rate is small, e.g. --lr 0.00002, similar to what we use for KITTI; I've updated the command in the README.

Thank you for your reply and look forward to the updated README.

I've found that the performance is better and more stable if the learning rate is small, e.g. --lr 0.00002, similar to what we use for KITTI; I've updated the command in the README.

In addition, how much learning rate do you use to fine-tune the KITTI 2015 dataset?
In section 4.2 of the paper, it is mentioned that the minimum learning rate used for fine-tuning the KITTI 2015 dataset is 1e-5. What is the maximum learning rate? I hope you can tell me, thank you!

On KITTI, we use --lr 0.00001

On KITTI, we use --lr 0.00001

Thank you. I used --lr 0.0002 and submitted the results to KITTI website for testing. The metrics of D1-all are consistent with those of RAFT-Stereo on the list. However, fine-tuning on the middlebury dataset with --lr 0.00002 did not get the same precision as the Middlebury.pth weights you supplied.