RenShuhuai-Andy/TimeChat

When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage.

tiesanguaixia opened this issue · 0 comments

Thank you for your excellent paper and open source code. I would like to ask when using 4 * V100 GPU for instruction tuning on the TimeChat model, I set world_size==4 and accum_grad_iters==8 unchanged, but when batch_size_train is set to 1 or 2, the memory usage seems to be the same, all almost filling up the memory of every V100 GPU. What is the reason for this? Thank you a lot!