Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Primary LanguagePythonMIT LicenseMIT