naver/croco

How long does it take to train Croco-Stereo?

Closed this issue · 2 comments

Does it take more time than pretraining? And is there any consideration for not using 8 GPUs but 3GPUs?

It takes about two weeks on my server. I wonder if I can use 8 GPUs and linearly scale up the learning rate: 3e-5 * 8 / 3 = 8e-5 ?
Another scaling rule is 3e-5 * sqrt(8 / 3) ~= 4.899e-5

Hi,

It took about a week on 3 GPUs for the largest model if I remember correctly.

Note that when dealing with stereo/flow datasets, network filesystems and or data augmentation on cpus might be too slow, you can check that by seeing if the time for "data" which is regularly printing is close to 0 or not (except for the first iteration)

We use a batch size of 6 in our experiments, thus you could easily scale to 6 GPUs. Moving to 8 would lead to a change a batch size which is never sure how it affects other hyperparameters such as learning rate, I cannot guarantee your proposed scaling rule does work well.

The stereo finetuning remains slow. In some recent work on other tasks (SACReg or dust3r), for faster finetuning, we instead perform a 2-stage finetuning with first a finetuning stage of the decoder&head alone at 224x224 resolution, before finetuning the full network for less epochs at higher resolution. However we did not apply this strategy for stereo and do not plan to study if it works well.

Best
Philippe