Training time

Question

Training time

BaohaoLiao opened this issue a year ago · 2 comments

I couldn't find the training time in your paper.

From your training script, it seems you use 1 and 4 GPU for Llama 7b and 70b, respectively. May I ask what is the training time for Llama 7b and 70b on c4? What is your GPU type, A100 80G or 40G version?

Answer 1 · 2023-12-27T00:54:10.000Z

Hi,

This is dependent on the task, but all experiments should fit within one A100 (80GB) GPU. I remember something like half a day for 7B models, and 5-6 days for 70B models. 7B models and (some?) 70B models (with small enough bits) could also fit within one A6000 (~48-49GB). I suspect 40GB should be fine especially for 7B models, though I don't think I have tried it.

We do support multi-GPU training, but that's mostly if you want to speed up the training.

Answer 2 · 2024-01-03T20:09:54.000Z

Thank you for he details!