/trpo

Trust Region Policy Optimization with TensorFlow and OpenAI Gym

Primary LanguageJupyter NotebookMIT LicenseMIT

Watchers