Training 70b model

Question

Training 70b model

Yummy416 opened this issue a year ago · 1 comments

If I want a 70b SFT model, how should I set the parameters? For full-parameter training, I currently use your framework to allocate at least one batch_size to each GPU. What if I want multiple GPUs to run one batch_size?

Answer 1 · 2023-12-06T05:46:30.000Z

Thanks for your interest in LMFlow! Currently, the batch size is required to be at least 1 per GPU. This difference in data should not incur too much memory cost.

According to our experience, the major memory cost comes from model, which is already handled by LMFlow's model parallelism mechanism and split into smaller parts across gpus. Considering the overhead, it is recommended to use a GPU server with total memory (RAM+GPU memory) at least 2.5T to run full fine-tuning of the 70b model, or use multi-node training or LoRA training. Hope that helps 🙏