YJiangcm/Lion

About the training loss.

Opened this issue · 3 comments

Hi Yuxin!

Thanks for your great work!

When reading the paper, I am confused about the training loss of the student model. The paper said "we fine-tune our student model S by minimizing the cross-entropy loss." So, how to use the CE loss to fine-tune the model, and where is the code implementation for this part, thank you very much!

Best wishes!

Thank you for your interest in our work.

In our research, we utilize the autoregressive language modeling objective to train the student model. This involves using the teacher model's responses to a set of instructions as the target for the student model. Since the language modeling objective is actually the cross-entropy loss, we refer to this objective as "cross-entropy loss" in our paper. We apologize for any ambiguity caused by this terminology. The primary goal of training the student model is to align its responses with those of the teacher model.

The code is implemented in src/train.py, which follows a similar approach to instruction tuning in Stanford Alpaca.

I hope this addresses your concerns. If you have any further questions, please let me know.

Hi!

Thanks for your prompt response!

Actually, I am still confused about the loss. Firstly, in which line of the file src/train.py you define the loss. Secondly, if the teacher model and the student model both output text, then how to use cross-entropy to calculate the loss?

Thank you very much!

From lines 116 to 142 in src/train.py, we define the dataset used for training, which contains the input as well as the label (target). The training loss is inner defined in transformers.AutoModelForCausalLM, and will be automatically computed when we call transformers.Trainer.train(). You may check the related documents.