mrahtz/learning-from-human-preferences
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
PythonMIT
Stargazers
- abdelSydney, Australia
- abursucvaleo.ai
- AdamStelmaszczyk
- alejandrodumasBuenos Aires, Argentina
- alexriglerSan Francisco
- allenye0119
- AndrewDotson
- ayox
- bigtreeljc
- bradfordlynch
- chopwoodwater
- ddbolineNew York, NY
- DenySu
- DuncanswilsonUniversity of Nevada, Reno
- dvasiliauskasDublin, Ireland
- izzeddingur
- jbwhitData Scientist
- joelburget@google
- kdudekul
- keiohtaTokyo, Japan
- KelvinsonSomewhere
- kun1989
- lavi135246Taiwan
- minimumnz
- mxochicaleAdvanced Research Computing Centre, @UCL
- NaereenLycée Kléber | Éducation Nationale
- nottombrownAnthropic
- regata
- rmarquis
- shodow
- sksq96
- sunshine-deep
- themechWikia
- tshrjnNew York City
- txizzleUC Berkeley, @mlberkeley
- wwxFromTjuDRL/MAS