What should be modified if I finetune the LLaMA with 2 A100 GPUs?
Closed this issue · 2 comments
Yuran-Zhao commented
Hi Wenxiang,
I find your work very useful and interesting. Thanks for you efforts!
I wonder what part in the example of fine-tuning LoRA needs to be modified if I only have 2 A100 GPUs (except the --nproc_per_node
).
wxjiao commented
Hi Yuran,
Make sure you have --per_device_train_batch_size
x --gradient_accumulation_steps
x --nproc_per_node
to be the global batch size you want.
Since you are using less GPUs with ZeRO2, it's likely to encounter OOM issues. If so, try to reduce --per_device_train_batch_size
and increase --gradient_accumulation_steps
accordingly.
Best,
Wenxiang
Yuran-Zhao commented
Thanks for your quick reply! I'll have a try.