YJiangcm/Lion

Implementation Details about the Student Model

Opened this issue · 1 comments

Hi Yuxin,

Thank you for your great work! In your paper you mentioned your method conducts 3 iterations to train, and in each iteration, you train the student model for 3 epochs using an AdamW optimizer with learning rate = 2e-5. I would like to clarify that, in each iteration for the student model, did you start with the same pre-trained LLaMA model, or start with the model trained in the last iteration? Thank you for your clarification!

Thanks for your interest in our work. In each iteration for the student model, we start with the model trained in the last iteration.