Implementation Details about the Student Model

Question

Implementation Details about the Student Model

Opened this issue 10 months ago · 1 comments

Hi Yuxin,

Thank you for your great work! In your paper you mentioned your method conducts 3 iterations to train, and in each iteration, you train the student model for 3 epochs using an AdamW optimizer with learning rate = 2e-5. I would like to clarify that, in each iteration for the student model, did you start with the same pre-trained LLaMA model, or start with the model trained in the last iteration? Thank you for your clarification!

Answer 1 · 2023-11-20T01:33:11.000Z

Thanks for your interest in our work. In each iteration for the student model, we start with the model trained in the last iteration.