xuyuzhuang11/OneBit

About GPU OOM

Closed this issue · 3 comments

Hi,

I ran "llama2_7b.sh" following your steps on a server with 3 available A100/80GB, but found with your default deepspeed option--per_device_train_batch_size 4 GPU will go OOM, the maximum I can have to set to --per_device_train_batch_size 3. I wonder if this is the expected behavior?

Thanks

Hi,

I ran "llama2_7b.sh" following your steps on a server with 3 available A100/80GB, but found with your default deepspeed option--per_device_train_batch_size 4 GPU will go OOM, the maximum I can have to set to --per_device_train_batch_size 3. I wonder if this is the expected behavior?

Thanks

3 A100/80GB to perform the knowledge distillation process may be a relative low-resource. Maybe 3 is OK (if there is no OOM), but I do not know this. :-)

Thank you!

Yes for --per_device_train_batch_size 3, 3 A100/80G seems OK, the GPU RAM usage go up to 80406/81920 MiB.