Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?

Question

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?

hzgdeerHo opened this issue 7 months ago · 8 comments

After finetuned the llama-3-8B-instruct with the same configuration ,as the code from:https://github.com/hiyouga/LLaMA-Factory/tree/3df986c6793a51ec2cb5f31fd1808cd3a9883bc4/examples/extrasexamples/extras/llama_pro always leads to apparent loss of original ability? I only used the train datasets "Identity". Can you help? THANKS

hzgdeerHo commented 7 months ago

Thanks !

hzgdeerHo commented 7 months ago

Thanks!

Answer 1 · 2024-05-17T14:05:38.000Z

The final training loss is about 0.1-0.05 ,and I think it is might not be caused by overfitting ?

Answer 2 · 2024-05-18T02:53:30.000Z

Hi! Have you tried to directly finetune llama-3-8B-instruct? What will happen in this setting?
I did not carry out the experiments with llama-3 so maybe I am not very familiar with the feature of it. I think you can also try to change the position of the added blocks. Recent Yi-tech report and some llama3-120B models show that maybe fix the first few layers are important. Hope this will be helpful!

Answer 3 · 2024-05-18T03:58:51.000Z

OK，thanks! Could you show me some link as reference to figure out the problem?

Answer 4 · 2024-05-18T04:03:27.000Z

Certainly! Here is the link to Yi-9B https://huggingface.co/01-ai/Yi-9B and its tech report https://arxiv.org/pdf/2403.04652
You can find the depth upscaling in the Sec 7.3

and LLaMa3-120B https://huggingface.co/alpindale/goliath-120b

Answer 5 · 2024-05-19T14:20:29.000Z

I have post this new issue :hiyouga/LLaMA-Factory#3811 . Would you please help to explain ? Thanks!

Answer 6 · 2024-05-19T15:20:42.000Z

Using small datasets and large epochs in training can easily lead to overfitting.