training time for similar performance with pre-trained checkpoint
hwopark opened this issue · 1 comments
hwopark commented
In the paper, the model was optimized using Adam on 8 NVIDIA 3090 GPUs with a batch size of 16.
How many epochs are needed to achieve performance comparable to the released pre-trained checkpoint? and could you share how much time it will take?