Implementation Details about the Student Model
Opened this issue · 1 comments
hzf1174 commented
Hi Yuxin,
Thank you for your great work! In your paper you mentioned your method conducts 3 iterations to train, and in each iteration, you train the student model for 3 epochs using an AdamW optimizer with learning rate = 2e-5. I would like to clarify that, in each iteration for the student model, did you start with the same pre-trained LLaMA model, or start with the model trained in the last iteration? Thank you for your clarification!
YJiangcm commented
Thanks for your interest in our work. In each iteration for the student model, we start with the model trained in the last iteration.