thomfoster/minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
Python
Stargazers
- 1140310118Harbin Institute of Technology, Shenzhen
- aieveryday
- augustintoma
- bobosuiShanghai
- BugCreat0r
- catetoSeoul
- chongzicbo
- DanteNoguez
- dawniii
- dgo2dance
- dmariko-yseop@yseop
- dumpmemory
- Eden1114Tsinghua University
- GanjinZeroDAMO Academy
- goddoeNAVER Cloud, Hyperscale AI
- ICT-XY
- jack139Amoy, China
- James4Ever0
- jon-towNew York, New York
- Leon-FrancisHarbin Institute of Technology, Shenzhen
- linyubupa
- lucascassianoArtificial inc. / MIT - Media Lab
- Ma-Dan
- meet-cjli
- rogervaas
- RossSongKorea
- RZ-Q
- sglucas
- we1l1n
- xianbin7
- Xiang-PanNational University of Singapore
- xrsrke@huggingface
- yetianshThe University of Hong Kong
- zerlinwangTsinghua University
- zhenpingliUniversity of Chinese Academy of Sciences
- ziqinyeowGrab | University Of Malaya