adjawdka opened this issue 7 days ago · 0 comments
When reproducing the code, the training accuracy is slightly lower than the paper, and the GPU does not support large batch sizes. How to set gradient accumulation.