pickxiguapi/Clean-Offline-RLHF
Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024)
PythonMIT
Stargazers
- AAISSJSungkyunkwan Univ.
- alexchen-buaa
- BluedragonXVINew York
- CeciliaYao
- chenchangjin-ccj
- cm090999
- EarthringTsinghua University
- emigmoTsinghua University
- evdcush
- Evolutionary-IntelligenceCSE, SUSTech
- FhujinwuSouth China University of Technology
- forrestbingAlibaba Inc
- GuanxingLuSoutheast -> Tsinghua
- hany606Daejeon, South Korea
- hilookas
- Jasonxu1225The Chinese University of Hong Kong, Shenzhen
- liuhc2022
- MancheryTsinghua University
- MengHsuxCity University of Hong Kong
- nidesuipao
- nissymoriThe University of Tokyo
- pickxiguapi
- qintianhaohao
- secondwtqSichuan University
- sh-jjNanjing University
- ShuoZheLi
- shyamsn97
- SibylGaoInstitute of Computing Technology, Chinese Academy of Sciences
- superboySBBeijing Institute of Technology
- thomas475
- trialbox
- zhimin-zSoftware Analysis and Intelligence Lab
- Zhiyu-h