Can you describe a bit more about the training process and data strategy. Load given to Inference in the paper
deshwalmahesh opened this issue · 2 comments
Hi,
I read the paper and apart from a single line as Instruction Fine Tuning paradigm
, there is nothing regarding the training process. Looking at the code, it seems like you're using the full fine tuning of the model and not PEFT style like LoRa
or p-tuning
etc. Did you try that too?
How may GPU and time it took for different datasets (it says you tested with different sizes of datasets ranging from 3K - 100K)
What was the data size vs training time?
Things like these would be helpful,
Thanks
Thanks for your interest! The time cost of full fine-tuning is present below :
GPUs | 3.5K samples | 10K samples | 30K samples | 100K samples | |
---|---|---|---|---|---|
7B | 8 | 0.30 | 0.82 | 2.38 | 7.77 |
13B | 8 | 0.55 | 1.51 | 4.33 | 14.83 |
33B | 16 | 0.77 | 2.08 | 5.85 | 20.29 |
The GPU we used is A100 (40G) and the unit of time in this table is the hour.
We also plan to add the PEFT training scripts in November. :)
Hey @Unrealluver thanks for your response.
Can you please also share whether you tried PEFT approaches for fine- tuning and if you did, what was the results for those and also WRT dataset size.
Also, did you test with Classification / Regression Head if you were concerned about the scores of two answers?