liziniu/policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
Python
Stargazers
- Alan-QinHKUST
- BepfCpNanjing University
- ChenruishuoNanjing University
- chloefresh
- emigmoTsinghua University
- George-Chia
- glorgao
- jyhong836University of Texas at Austin
- l0he1g中国
- liziniuThe Chinese University of Hong Kong, Shenzhen
- Olivia-fsmEcole Polytech Federal of Lausanne
- sherckloNanjing University
- Tanliandeshaonv
- tianxuskyNanjing University
- TianyunYoungChina, Beijing
- TrbingWY
- wz139704646Nanjing University
- xiami2019Fudan University&Sun Yat-Sen University
- yqt
- zbzhu99SJTU|Apex Lab
- zldscr0
- zyushun
- zzq-botNanjing University