Question about test results on KITTI and Middlebury

Hello, I test the pretrained RAFT-Stereo model using test.py, here are the results I get:

KITTI-15 All
EPE: 1.4704
bad 1.0: 26.43%
bad 2.0: 9.43%
bad 3.0: 5.56%

Midd-T F All
EPE: 9.1773
bad 1.0: 26.56%
bad 2.0: 18.44%
bad 3.0: 15.57%

I notice these results are slightly different from results reported in Table 6 in your paper. I wonder if there is something wrong with my code. I can upload the full test code I used if necessary. Thank you.

Hello, thank you for bringing this issue to my attention. I have just tested again the model using the code and the provided ckpt in our repository. The results I am obtaining are as follows:

KITTI-15 All
bad 1.0: 25.97%
bad 2.0: 9.21%
bad 3.0: 5.39%

Midd-T F All
bad 1.0: 24.62%
bad 2.0: 16.46%
bad 3.0: 13.87%

I have noticed slight variations after changing the version of PyTorch w.r.t. Table 6 of our paper. However, I cannot explain how the numbers you showed are so different. Have you made any modifications to the code? Have you verified that the dataset used is correct?

I've found the issue. In the original RAFT-Stereo code, the default parameter "iters=12" should be set to 32 for evaluation. On line 70 of the original code of RAFT-Stereo, you should modify the iters parameter such that:

def forward(self, image1, image2, iters=12, flow_init=None, test_mode=False) -> def forward(self, image1, image2, iters=32, flow_init=None, test_mode=False)

This is the exact number of iterations that in the original RAFT-Stereo paper is used for evaluation.

Thank you for notifying me of the issue, I will update our repository's documentation.

Thank you very much. I've got the correct results. And I think a more appropriate modification would be changing this line:

NeRF-Supervised-Deep-Stereo/code_snippets/test.py

Line 79 in 1478080

pred_disps = model(data['im2'], data['im3'])

Add 'iters=32' or 'iters=args.valid_iters' for RAFT-Stereo model.