philschmid/sagemaker-huggingface-llama-2-samples

Fine Tuning LLAMA2-13b on ml.g4dn.12xlarge is taking too much time.

hz-nm opened this issue · 5 comments

hz-nm commented

So I was fine tuning LLAMA2 13B on a different dataset. I used the code tweaked it a little just to preprocess that specific dataset. Then I ran it via SageMaker training job.
The training was running great but it was very slow and even after 24 hours it only managed to go up to 7% on ml.g4dn.12xlarge instance.
Can anyone please guide me how I can increase the speed of training.
Unfortunately I cannot use "ml.g5.4xlarge" since that training instance is not available in the region I am working with right now.
Thanks.

g4 instances are very old and not build for training. You should rather try p3 then.

hz-nm commented

Thank you for replying I will try p3 instances and inform back on the results so that you can update the chart as well.
Again many thanks.

hz-nm commented

Will the p3.2xlarge be sufficient for training and merging or its GPU Memory is too little?
Here are the specs,
VCPUs: 8
Instance Memory: 61 GiB
Total GPU Memory: 16 GB

p3.8xlarge has 4 GPUs and a similar increment in Instance Memory.

Not sure, with peft and int-4 maybe but using a instance with more GPU would be better.

hz-nm commented

Thanks again. I will update here soon.