OpenLMLab/MOSS-RLHF

Training on 8 Nvidia RTX A6000

Opened this issue · 1 comments

Hi Authors, thank you so much for your huge contribution!! I'm pretty new to the optimization workarounds for training large models, so I'm struggling to get the training for Llama-7B started on my setup (8 Nvidia RTX A6000s each having 48 GB of GPU memory). What would you recommend changing the optimization config to get the training working in this case? Thank you so much!

Thank you very much for your interest in this project, and I apologize for the delayed reply.

We set zero3 and offload the parameters to the CPU, the bsz is set to 2, and we cost around 54G.