What should be modified if I finetune the LLaMA with 2 A100 GPUs?

Question

What should be modified if I finetune the LLaMA with 2 A100 GPUs?

Closed this issue a year ago · 2 comments

Hi Wenxiang,

I find your work very useful and interesting. Thanks for you efforts!

I wonder what part in the example of fine-tuning LoRA needs to be modified if I only have 2 A100 GPUs (except the --nproc_per_node).

Answer 1 · 2023-09-07T14:47:07.000Z

Hi Yuran,

Make sure you have --per_device_train_batch_size x --gradient_accumulation_steps x --nproc_per_node to be the global batch size you want.
Since you are using less GPUs with ZeRO2, it's likely to encounter OOM issues. If so, try to reduce --per_device_train_batch_size and increase --gradient_accumulation_steps accordingly.

Best,
Wenxiang

Answer 2 · 2023-09-07T15:09:32.000Z

Thanks for your quick reply! I'll have a try.