Batch size and number of GPUs
Opened this issue · 1 comments
shkarupa-alex commented
In paper you mentioned batch sizes 8192 and 12288.
But in training script you got batch size 640 and num_workers 8 = 5120
How large the actual batch size and how many GPUs used to train models from paper?
sunxm2357 commented
For the main experiment for distilling ViT-B, the batch size is 12288 for 24 GPUs.