Difference between SFTTrainer and Seq2seqTrainer

Question

Difference between SFTTrainer and Seq2seqTrainer

Hyfred opened this issue 13 days ago · 0 comments

The traditional approach separates the input (e.g., document) and label (e.g., summary), and the loss is calculated based on generation compared to the label. I think this refers to the Seq2seqTrainer.

And the SFTTrainer wraps the input and label together as one instruction (where input and label are the same) and trains it as a next-token prediction task. While these approaches seem similar, I wonder if there is a performance difference between the two. Is anyone have a sense of which mechanism is better suited to specific scenarios?