OpenMOSS/AnyGPT

how many a100 it cost in training and if i want to train on v100 what is the number needed ?

Closed this issue · 3 comments

how many a100 it cost in training and if i want to train on v100 what is the number needed ?

In the pre-training phase, we used 96 A100 GPUs, with 4500 tokens per gpu, set the gradient accumulation steps to 8, and trained for 81k steps. The training took about one week. The computational resources required for the instruction tuning phase are much less.

Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?

Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?

80G