Training time
BaohaoLiao opened this issue · 2 comments
I couldn't find the training time in your paper.
From your training script, it seems you use 1 and 4 GPU for Llama 7b and 70b, respectively. May I ask what is the training time for Llama 7b and 70b on c4? What is your GPU type, A100 80G or 40G version?
Hi,
This is dependent on the task, but all experiments should fit within one A100 (80GB) GPU. I remember something like half a day for 7B models, and 5-6 days for 70B models. 7B models and (some?) 70B models (with small enough bits) could also fit within one A6000 (~48-49GB). I suspect 40GB should be fine especially for 7B models, though I don't think I have tried it.
We do support multi-GPU training, but that's mostly if you want to speed up the training.
Thank you for he details!