About the training speed

Question

About the training speed

Closed this issue a year ago · 3 comments

I found that the total number of iterations for the training is 400,000. May I ask, how many days does it take for you to train a distilled model? I use 8*V100, I found that I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).

Answer 1 · 2023-11-07T02:37:45.000Z

With a batch size of 256 (=4×64), training BK-SDM-Base for 50K iterations takes about 300 hours and 53GB GPU memory. With a batch size of 64 (=4×16), it takes 60 hours and 28GB GPU memory.
Training BK-SDM-{Small, Tiny} results in 5∼10% decrease in GPU memory usage.

Answer 2 · 2023-11-07T02:40:38.000Z

I seem that BK-SDM-Base will take 300h * (400K / 50K) == 2400h.

Answer 3 · 2023-11-12T11:57:44.000Z

Hi, we would like to clarify our setting.

I found that the total number of iterations for the training is 400,000.

No. Although our script specifies --max_train_steps=400000, we released the checkpoints at the exact 50000-th step as described in our paper.
- The reason for setting a longer max_train_steps was to inspect the impact of iterations on model performance.

I can only complete around 3,800 iterations in one night (from 19:55 to 10:00 the next day).

one night from 19:55 to 10:00 the next day = 14h
50000 iter / 3800 iter * 14 h = 184.21 h

Though our models were trained on a single A100, using multiple GPUs with a smaller per-GPU batch size can accelerate training speeds.