Difference between SFTTrainer and Seq2seqTrainer
Hyfred opened this issue · 0 comments
Hyfred commented
The traditional approach separates the input (e.g., document) and label (e.g., summary), and the loss is calculated based on generation compared to the label. I think this refers to the Seq2seqTrainer.
And the SFTTrainer wraps the input and label together as one instruction (where input and label are the same) and trains it as a next-token prediction task. While these approaches seem similar, I wonder if there is a performance difference between the two. Is anyone have a sense of which mechanism is better suited to specific scenarios?