anthropics/hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

MIT

Readme
0Issues
1.6kStargazers
20Watchers

Watchers

amazon10
aramasethu
Prediction Guard
CamaradaLares
dganguli
Anthropic
duyvuleo
@oracle
eemailme
hessjp
JamieJoyce
jareddk
l1n
Noble Jury Software
levitation
Simplify / Macrotec LLC
liangdu
midmarketplace
MidMarket Alliance
MyDevClouds
no-identd
Laniakea
nuryslyrt
AWS
phymucs
qtvhao
FPT Software
trappedinspacetime
For Personal Use
yuzhangbit
Beijing

Contact site admin: Geeks.