About the training seed
lucasliunju opened this issue · 5 comments
Hi,
Thanks for your great work.
May I ask whether you try to use different seed to fine-tune on commonsense task.
I find the test accuracy is not very stable if we do not fix the random seed.
Hello, I haven't previously explored the impact of utilizing different seeds for conducting experiments. All the results obtained so far are from a single run. I will investigate this matter and provide you with an update at a later time. Thank you for bringing this to my attention!
Thank you very much for your reply!
@lucasliunju Apologies for the delayed response; I was quite swamped with my other projects the last two weeks.
I have conducted 5 runs for finetuning LLaMA3 using DoRA r=16, and the results are presented in the table below.
Although the results are similar for most of the runs, we can still observe some fluctuations within them. Especially for run 5, which has the lowest average score and particularly the lowest score on BoolQ out of all 6 runs, I discovered that it also has the lowest training loss at the end. This could imply that the model is overfitting too much. Since I used the same hyperparameter configuration as LoRA for these experiments, extensive hyperparameter optimization might make the results much more stable and avoid overfitting, but unfortunately, I currently don't have the GPU bandwidth to conduct extensive hyperparameter search.
Hi, Thanks for your reply.
May I ask whether commonsense task is a good choice if I want to quickly evaluate the performance of different peft methods.