Fine Tuning LLAMA2-13b on ml.g4dn.12xlarge is taking too much time.
hz-nm opened this issue · 5 comments
So I was fine tuning LLAMA2 13B on a different dataset. I used the code tweaked it a little just to preprocess that specific dataset. Then I ran it via SageMaker training job.
The training was running great but it was very slow and even after 24 hours it only managed to go up to 7% on ml.g4dn.12xlarge instance.
Can anyone please guide me how I can increase the speed of training.
Unfortunately I cannot use "ml.g5.4xlarge" since that training instance is not available in the region I am working with right now.
Thanks.
g4
instances are very old and not build for training. You should rather try p3 then.
Thank you for replying I will try p3 instances and inform back on the results so that you can update the chart as well.
Again many thanks.
Will the p3.2xlarge be sufficient for training and merging or its GPU Memory is too little?
Here are the specs,
VCPUs: 8
Instance Memory: 61 GiB
Total GPU Memory: 16 GB
p3.8xlarge has 4 GPUs and a similar increment in Instance Memory.
Not sure, with peft and int-4 maybe but using a instance with more GPU would be better.
Thanks again. I will update here soon.