dvlab-research/Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Python

Issues

Ablation between DPO and Step-DPO
#20 opened 2 months ago by tqzhong
1
Question about eval results on Qwen2-7B-Instruct
#21 opened 19 days ago by LUOMU17
0
Request for Citation
#5 opened 5 months ago by hbin0701
3
Does step-dpo work?
#19 opened 2 months ago by hxdtest
0
question about StepDPOTrainer
#18 opened 2 months ago by FlyingDutchman26
1
eval_math:143 prompt_answer = remove_text(prompt_answer)
#17 opened 3 months ago by Xalp
0
question about Data Construction
#15 opened 3 months ago by hong-xl
1
I followed the steps in the README file to train the model, but I got an error. Here is the error message.
#16 opened 3 months ago by Claude121381011
0
复现问题
#2 opened 5 months ago by yyht
8
Evaluation scripts for AIME and Odyssey-MATH
#14 opened 3 months ago by bmanczak
0
share sft-dataset
#1 opened 5 months ago by yyht
4
deepseek-math-7b-rl-stepdpo推理后的结果问题
#13 opened 4 months ago by wjn1996
1
Great work， what about the computation resources needed for each experiment
#6 opened 5 months ago by yanghu819
4
validation set
#12 opened 4 months ago by kaishxu
0
appendix missing
#11 opened 4 months ago by ChrisMii
1
During DPO training, will SFT loss be calculated?
#10 opened 4 months ago by mohhao
0
question about Data Construction Pipeline
#9 opened 4 months ago by yyht
0
questions about some parameter in config_full.yaml
#8 opened 4 months ago by kaishxu
1
Data Generation Pipeline
#4 opened 5 months ago by yapdianang
2
Question about the DPO vs. Step-DPO.
#7 opened 5 months ago by flow3rdown
2
About details of Step localization and Rectification
#3 opened 5 months ago by ToheartZhang
1