TencentARC/LLaMA-Pro

Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?

hzgdeerHo opened this issue · 8 comments

After finetuned the llama-3-8B-instruct with the same configuration ,as the code from:https://github.com/hiyouga/LLaMA-Factory/tree/3df986c6793a51ec2cb5f31fd1808cd3a9883bc4/examples/extrasexamples/extras/llama_pro always leads to apparent loss of original ability? I only used the train datasets "Identity". Can you help? THANKS

The final training loss is about 0.1-0.05 ,and I think it is might not be caused by overfitting ?

Hi! Have you tried to directly finetune llama-3-8B-instruct? What will happen in this setting?
I did not carry out the experiments with llama-3 so maybe I am not very familiar with the feature of it. I think you can also try to change the position of the added blocks. Recent Yi-tech report and some llama3-120B models show that maybe fix the first few layers are important. Hope this will be helpful!

OK,thanks! Could you show me some link as reference to figure out the problem?

Certainly! Here is the link to Yi-9B https://huggingface.co/01-ai/Yi-9B and its tech report https://arxiv.org/pdf/2403.04652
You can find the depth upscaling in the Sec 7.3
image
and LLaMa3-120B https://huggingface.co/alpindale/goliath-120b

Thanks !

I have post this new issue :hiyouga/LLaMA-Factory#3811 . Would you please help to explain ? Thanks!

Using small datasets and large epochs in training can easily lead to overfitting.

Thanks!