About GPU OOM
Closed this issue · 3 comments
Hi,
I ran "llama2_7b.sh" following your steps on a server with 3 available A100/80GB, but found with your default deepspeed option--per_device_train_batch_size 4
GPU will go OOM, the maximum I can have to set to --per_device_train_batch_size 3
. I wonder if this is the expected behavior?
Thanks
Hi,
I ran "llama2_7b.sh" following your steps on a server with 3 available A100/80GB, but found with your default deepspeed option
--per_device_train_batch_size 4
GPU will go OOM, the maximum I can have to set to--per_device_train_batch_size 3
. I wonder if this is the expected behavior?Thanks
3 A100/80GB to perform the knowledge distillation process may be a relative low-resource. Maybe 3 is OK (if there is no OOM), but I do not know this. :-)
Thank you!
Yes for --per_device_train_batch_size 3
, 3 A100/80G seems OK, the GPU RAM usage go up to 80406/81920 MiB.