l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
PythonMIT
Stargazers
- 2659170494ZhangPu
- allendred
- authurlordBeihang University
- BinJingZuo
- DanqingZUniversity of California Berkeley
- dtonew
- Dwyane3
- hijkzzzNVIDIA
- houwenxin
- Keysmis
- Lauorie
- lin1490188
- LIngerwsk
- LiuShixing
- lnyxzdevk
- lurenlym
- mittalpusa
- Mr-NineteenShanghai
- murphypeiPeking University
- MyHerbTea
- pwq1989
- RobertWang-Github
- scarydemon2JD
- skepsun
- Stark-zheng
- sunkyya
- superjamessyxWestlake University
- TechWithRayGoogle
- TheEighthDayRUC AIMC
- ufwt
- wang-zeruiShanghai AI Laboratory
- we1l1n
- XiaoYeeShanghai
- yanshanjing
- yiranvang
- zhaobinNFFudan University