how many a100 it cost in training and if i want to train on v100 what is the number needed ?
Closed this issue · 3 comments
Yang-bug-star commented
how many a100 it cost in training and if i want to train on v100 what is the number needed ?
JunZhan2000 commented
In the pre-training phase, we used 96 A100 GPUs, with 4500 tokens per gpu, set the gradient accumulation steps to 8, and trained for 81k steps. The training took about one week. The computational resources required for the instruction tuning phase are much less.
PyChaser commented
Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?
JunZhan2000 commented
Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?
80G