iteration-time increases linearly when micro_batch_size=1
LlinWing opened this issue · 1 comments
I reported this issue last time in the issue #22
After conducting extensive investigation, I finally found that this issue only occurs when setting micro_batch_size=1. So I decided to open a new issue to emphasize this point.
I believe you can reproduce this issue because I pulled your latest code and ran it with micro_batch_size=1, and the problem still persisted. It returned to normal after setting micro_batch_size to 2. (both experiments` settings are tp=2 and pp=4 on 8 * A100 40G)
thanks, that's very interesting, and might explain why we never run into the problem ourselves during the real training runs. will investigate more.
in the meantime, all models seem to train fine with the configs we recommend in the docu