princeton-vl/RAFT-Stereo

Training settings for RAFT-stereo realtime

dilinwang820 opened this issue · 7 comments

Hi there, may I ask what's the number of training iterations for RAFT-stereo realtime? Thank you!

We trained RAFT-stereo realtime for 200,000 iterations

Hi @lahavlipson sorry, I was not clear before. I was referring to the number of GRU updates here - https://github.com/princeton-vl/RAFT-Stereo/blob/main/core/raft_stereo.py#L70

As mentioned in the README, the corresponding valid_iters is 7, I am wondering if the train_iters is also 7? Thanks!

train_iters was also set to 7 for this model

Hi @lahavlipson I was able to reproduce your "raftstereo-sceneflow.pth" checkpoint with little differences.
However, there's a little bit gap for the realtime model I trained vs. your raftstereo-realtime.pth checkpoint.

Specifically, I constructed the realtime model per your suggestion above -

# model config
hidden_dims=[128] * 3,
shared_backbone=True,
corr_levels=4,
corr_radius=4,
n_downsample=3,
slow_fast_gru=True,
n_gru_layers=2,
train_iters=7, 
valid_iters=7,
freeze_bn=False,

The training is set to be the same as the standard raftstereo setting. The total batch size is 8 on 2 GPUs. And I only trained on the Sceneflow dataset.

# lr scheduler
max_iter = 200000
lr = 1e-4
optimizer = dict(type="AdamW", lr=lr, weight_decay=0.00001)
optimizer_config = dict(grad_clip=dict(max_norm=1))

Anything I might be missing? Any suggestions are greatly appreciated!

Method kitti15 (3px) m-f (2px) m-h (2px) m-q (2px) eth3d (1px)
raft-stereo realtime ckp 5.666 18.005 11.364 8.977 5.751
reproduced 6.249 17.354 11.492 9.846 5.725

Your settings seem fine; the evaluation datasets have few images or are fairly sparse, so fluctuations in performance between runs is pretty normal

Thank you for confirming!

Hi @lahavlipson I was able to reproduce your "raftstereo-sceneflow.pth" checkpoint with little differences. However, there's a little bit gap for the realtime model I trained vs. your raftstereo-realtime.pth checkpoint.

Specifically, I constructed the realtime model per your suggestion above -

# model config
hidden_dims=[128] * 3,
shared_backbone=True,
corr_levels=4,
corr_radius=4,
n_downsample=3,
slow_fast_gru=True,
n_gru_layers=2,
train_iters=7, 
valid_iters=7,
freeze_bn=False,

The training is set to be the same as the standard raftstereo setting. The total batch size is 8 on 2 GPUs. And I only trained on the Sceneflow dataset.

# lr scheduler
max_iter = 200000
lr = 1e-4
optimizer = dict(type="AdamW", lr=lr, weight_decay=0.00001)
optimizer_config = dict(grad_clip=dict(max_norm=1))

Anything I might be missing? Any suggestions are greatly appreciated!

Method kitti15 (3px) m-f (2px) m-h (2px) m-q (2px) eth3d (1px)
raft-stereo realtime ckp 5.666 18.005 11.364 8.977 5.751
reproduced 6.249 17.354 11.492 9.846 5.725

Hi, dilinwang820
I cannot reproduce the result of realtime model, my result is much worse than yours and released model. my eth3d d1 is 6.61, 'things' d1 is 15. I found you open batchnorm, is that the point? and about learning rate, I found the scheduler's max lr=2e-4, and div_factor is 25(default), final_div_factor is 1e4(default), did you use this settings? or you just use lr=1e-4 and did not change during the whole training process?
thank you