FitMachineLearning/pytorch-trpo

PyTorch Implementation of Trust Region Policy Optimization (TRPO)

PythonMIT

PyTorch implementation of TRPO

This repo contains a PyTorch implementation of a Trust Region Policy Optimization agent for an environment with a discrete action space.

Environment Setup

Install conda for Python 2.7.

conda create --name trpo --file requirements/conda_requirements.txt
source activate trpo
pip install -r requirements/pip_requirements.txt

Install PyTorch from source at commit eff5b8b.

Usage

python run_trpo.py --env=GYM_ENV_ID

where GYM_ENV_ID is the environment ID of the gym environment you which to train on.

Results

A game of Pong as played using the policy model learned from a TRPO agent

Plot of total reward per episode of Pong vs. episode number

Related Repos

OpenAI's Baseline implementation of parallel TRPO in TensorFlow

Ilya Kostrikov's implementation of TRPO for continuous control in PyTorch