anthropics/hh-rlhf
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
MIT
Watchers
- amazon10
- aramasethuPrediction Guard
- CamaradaLares
- dganguliAnthropic
- duyvuleo@oracle
- eemailme
- hessjp
- JamieJoyce
- jareddk
- l1nNoble Jury Software
- levitationSimplify / Macrotec LLC
- liangdu
- midmarketplaceMidMarket Alliance
- MyDevClouds
- no-identdLaniakea
- nuryslyrtAWS
- phymucs
- qtvhaoFPT Software
- trappedinspacetimeFor Personal Use
- yuzhangbitBeijing