apple/ml-cvnets

Get different accuracy on different GPU

jimmylin0979 opened this issue · 4 comments

Hi, I trained the mobilevit-xxs model on 2 different machine, and I got different results, while the accuracy on Titan RTX is always lower than the one on RTX 2080Ti by 0.5%.

Below is the specs of 2 machines:

  • Machine 1: Titan RTX, pytorch 1.10.0
  • Machine 2: RTX 2080 Ti pytorch 1.11.0

After checking the code, I can only think of AMP as potential problem, but both gpus are using TU102 as chip, so they should support the same precision of float.

Do you have any idea about where might cause the problem ?

Thank you

Besides AMP, TF32 matmul could also be a culprit.

Thanks for the fast replying !

Besides AMP, TF32 matmul could also be a culprit.

Under this circumstance, which mode did the experiments be trained ? in TF32 or FP32 mode ?
Thank you !

Try to enable TF32. Also, use a longer warmup of 20 epochs.

Try to enable TF32. Also, use a longer warmup of 20 epochs.

Thanks ! I will have a experiment with that settings !