dvlab-research/Step-DPO

During DPO training, will SFT loss be calculated?

mohhao opened this issue · 0 comments

Because step dpo have an incomplete output that may influence the output of SFT model