how many a100 it cost in training and if i want to train on v100 what is the number needed ?

Question

how many a100 it cost in training and if i want to train on v100 what is the number needed ?

Closed this issue 3 months ago · 3 comments

Answer 1 · 2024-03-04T11:33:30.000Z

In the pre-training phase, we used 96 A100 GPUs, with 4500 tokens per gpu, set the gradient accumulation steps to 8, and trained for 81k steps. The training took about one week. The computational resources required for the instruction tuning phase are much less.

Answer 2 · 2024-04-05T16:49:57.000Z

Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?

Answer 3 · 2024-04-07T08:54:56.000Z

Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?

80G