About the training speed
Closed this issue · 3 comments
I found that the total number of iterations for the training is 400,000. May I ask, how many days does it take for you to train a distilled model? I use 8*V100, I found that I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).
With a batch size of 256 (=4×64), training BK-SDM-Base for 50K iterations takes about 300 hours and 53GB GPU memory. With a batch size of 64 (=4×16), it takes 60 hours and 28GB GPU memory.
Training BK-SDM-{Small, Tiny} results in 5∼10% decrease in GPU memory usage.
I seem that BK-SDM-Base will take 300h * (400K / 50K) == 2400h.
Hi, we would like to clarify our setting.
I found that the total number of iterations for the training is 400,000.
- No. Although our script specifies
--max_train_steps=400000
, we released the checkpoints at the exact 50000-th step as described in our paper.- The reason for setting a longer
max_train_steps
was to inspect the impact of iterations on model performance.
- The reason for setting a longer
I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).
- one night from 19:55 to 10:00 the next day = 14h
- 50000 iter / 3800 iter * 14 h = 184.21 h
Though our models were trained on a single A100, using multiple GPUs with a smaller per-GPU batch size can accelerate training speeds.