PyTorch implementation of TRPO
This repo contains a PyTorch implementation of a Trust Region Policy Optimization agent for an environment with a discrete action space.
Environment Setup
-
Install conda for Python 2.7.
conda create --name trpo --file requirements/conda_requirements.txt
source activate trpo
pip install -r requirements/pip_requirements.txt
- Install PyTorch from source at commit eff5b8b.
Usage
python run_trpo.py --env=GYM_ENV_ID
where GYM_ENV_ID is the environment ID of the gym environment you which to train on.
Results
A game of Pong as played using the policy model learned from a TRPO agent
Plot of total reward per episode of Pong vs. episode number
Related Repos
OpenAI's Baseline implementation of parallel TRPO in TensorFlow
Ilya Kostrikov's implementation of TRPO for continuous control in PyTorch