wxjiao/ParroT

What should be modified if I finetune the LLaMA with 2 A100 GPUs?

Closed this issue · 2 comments

Hi Wenxiang,

I find your work very useful and interesting. Thanks for you efforts!

I wonder what part in the example of fine-tuning LoRA needs to be modified if I only have 2 A100 GPUs (except the --nproc_per_node).

wxjiao commented

Hi Yuran,

Make sure you have --per_device_train_batch_size x --gradient_accumulation_steps x --nproc_per_node to be the global batch size you want.
Since you are using less GPUs with ZeRO2, it's likely to encounter OOM issues. If so, try to reduce --per_device_train_batch_size and increase --gradient_accumulation_steps accordingly.

Best,
Wenxiang

Thanks for your quick reply! I'll have a try.