dvlab-research/Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
Python
Stargazers
- AFeng-xSCUT | Alibaba
- AlongWYResearch Center for Social Computing and Information Retrieval
- bozhao-li
- CMathmo
- Eddieeee-LiuSmartmore
- Fanziyang-v深圳
- GanjinZeroDAMO Academy
- ggsonic
- hzy312Beijing
- j-cyoungHarbin Institute of Technology (ShenZhen)
- jihanyangThe University of Hong Kong
- JulianJuanerCUHK, SmartMore
- leaves162HITsz
- liuyijungoonHarbin Institute of Technology, Shenzhen
- lloongx
- lu-m13Intel Labs China
- Pbihaothe Chinese University of Hong Kong
- rentainheIDEA
- ruihang-chuCUHK
- Serge-weihaoSJTU
- seshurajup@dolcera
- ShaoTengLiuCUHK
- studentfromChina
- tau-yihouxiangSingapore
- TGLTommy
- tianzhuotaoThe Chinese Univeristy of Hong Kong
- VandyLuHong Kong
- VincentDENGP
- wcy1122The Chinese University of Hong Kong
- wonderseen
- X-LaiThe Chinese University of Hong Kong
- YangsenqiaoHarbin Institute of Technology
- yanwei-liThe Chinese University of Hong Kong
- yifanpu001Tsinghua University
- yukang2017CUHK
- zs-zhong