mrahtz/learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"

PythonMIT

Readme
10Issues
309Stargazers
11Watchers

Watchers

ammonite
Converge Inc.
aobai
gabrielcc2
Rewe Digital, University of Magdeburg, OvGU
gnayuy
Seattle,WA
jhcloos
jinjunqi
THU BigEye
maphysart
mrahtz
paper2code-bot
@paper2code
RubinOrlando
wx-b
RIOS

Contact site admin: Geeks.