sunxm2357/DIME-FM

Batch size and number of GPUs

Opened this issue · 1 comments

In paper you mentioned batch sizes 8192 and 12288.
But in training script you got batch size 640 and num_workers 8 = 5120

How large the actual batch size and how many GPUs used to train models from paper?

For the main experiment for distilling ViT-B, the batch size is 12288 for 24 GPUs.