When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage.
tiesanguaixia opened this issue · 0 comments
tiesanguaixia commented
Thank you for your excellent paper and open source code. I would like to ask when using 4 * V100 GPU for instruction tuning on the TimeChat model, I set world_size==4
and accum_grad_iters==8
unchanged, but when batch_size_train
is set to 1 or 2, the memory usage seems to be the same, all almost filling up the memory of every V100 GPU. What is the reason for this? Thank you a lot!