/trpo

trust region policy optimization base on gym and tensorflow, can run in distribution mode

Primary LanguagePython

Stargazers