Fine Tuning LLAMA2-13b on ml.g4dn.12xlarge is taking too much time.

Question

Fine Tuning LLAMA2-13b on ml.g4dn.12xlarge is taking too much time.

hz-nm opened this issue a year ago · 5 comments

So I was fine tuning LLAMA2 13B on a different dataset. I used the code tweaked it a little just to preprocess that specific dataset. Then I ran it via SageMaker training job.
The training was running great but it was very slow and even after 24 hours it only managed to go up to 7% on ml.g4dn.12xlarge instance.
Can anyone please guide me how I can increase the speed of training.
Unfortunately I cannot use "ml.g5.4xlarge" since that training instance is not available in the region I am working with right now.
Thanks.

Answer 1 · 2023-08-24T05:26:17.000Z

g4 instances are very old and not build for training. You should rather try p3 then.

Answer 2 · 2023-08-24T05:54:11.000Z

Thank you for replying I will try p3 instances and inform back on the results so that you can update the chart as well.
Again many thanks.

Answer 3 · 2023-08-24T06:03:59.000Z

Will the p3.2xlarge be sufficient for training and merging or its GPU Memory is too little?
Here are the specs,
VCPUs: 8
Instance Memory: 61 GiB
Total GPU Memory: 16 GB

p3.8xlarge has 4 GPUs and a similar increment in Instance Memory.

Answer 4 · 2023-08-24T06:06:31.000Z

Not sure, with peft and int-4 maybe but using a instance with more GPU would be better.

Answer 5 · 2023-08-24T06:07:33.000Z

Thanks again. I will update here soon.