Performances as a function of the batch size

Question

Performances as a function of the batch size

Closed this issue 2 years ago · 1 comments

Hi, thanks for releasing this cool work!
I have a question about Fig. 4 in your paper and the related paragraph. Since you train for 25k iterations with batch_size = 8, do you also increase (or decrease) the number of iterations when the batch size decreases (or increases, respectively), or is it always kept fixed at 25k?
Thank you in advance for the answer.

Answer 1 · 2022-10-26T11:25:43.000Z

Great question. For those low-batchsize models I had to train longer to give them a fair chance. (With lower batchsize, sometimes val is still increasing after 25k.) I think B2 and B4 had 50k, and B8 maybe 40k.