dvlab-research/Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
Python
Issues
- 1
Ablation between DPO and Step-DPO
#20 opened by tqzhong - 0
Question about eval results on Qwen2-7B-Instruct
#21 opened by LUOMU17 - 3
Request for Citation
#5 opened by hbin0701 - 0
Does step-dpo work?
#19 opened by hxdtest - 1
question about StepDPOTrainer
#18 opened by FlyingDutchman26 - 0
- 1
question about Data Construction
#15 opened by hong-xl - 0
I followed the steps in the README file to train the model, but I got an error. Here is the error message.
#16 opened by Claude121381011 - 8
- 0
Evaluation scripts for AIME and Odyssey-MATH
#14 opened by bmanczak - 4
share sft-dataset
#1 opened by yyht - 1
deepseek-math-7b-rl-stepdpo推理后的结果问题
#13 opened by wjn1996 - 4
- 0
validation set
#12 opened by kaishxu - 1
appendix missing
#11 opened by ChrisMii - 0
During DPO training, will SFT loss be calculated?
#10 opened by mohhao - 0
question about Data Construction Pipeline
#9 opened by yyht - 1
- 2
Data Generation Pipeline
#4 opened by yapdianang - 2
Question about the DPO vs. Step-DPO.
#7 opened by flow3rdown - 1