/learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"

Primary LanguagePythonMIT LicenseMIT

Watchers