Can you describe a bit more about the training process and data strategy. Load given to Inference in the paper

Question

Can you describe a bit more about the training process and data strategy. Load given to Inference in the paper

deshwalmahesh opened this issue a year ago · 2 comments

Hi,
I read the paper and apart from a single line as Instruction Fine Tuning paradigm, there is nothing regarding the training process. Looking at the code, it seems like you're using the full fine tuning of the model and not PEFT style like LoRa or p-tuning etc. Did you try that too?

How may GPU and time it took for different datasets (it says you tested with different sizes of datasets ranging from 3K - 100K)

What was the data size vs training time?

Things like these would be helpful,
Thanks

Answer 1 · 2023-10-28T14:56:02.000Z

Thanks for your interest! The time cost of full fine-tuning is present below :

	GPUs	3.5K samples	10K samples	30K samples	100K samples
7B	8	0.30	0.82	2.38	7.77
13B	8	0.55	1.51	4.33	14.83
33B	16	0.77	2.08	5.85	20.29

The GPU we used is A100 (40G) and the unit of time in this table is the hour.
We also plan to add the PEFT training scripts in November. :)

Answer 2 · 2023-10-30T05:09:43.000Z

Hey @Unrealluver thanks for your response.

Can you please also share whether you tried PEFT approaches for fine- tuning and if you did, what was the results for those and also WRT dataset size.

Also, did you test with Classification / Regression Head if you were concerned about the scores of two answers?