/relax_trpo_example

Example TRPO implementation with ReLAx

Primary LanguageJupyter Notebook

Example TRPO implementation with ReLAx

This repository contains an implementation of trust region policy optimization (TRPO) with ReLAx.

TRPO actor was trained on HalfCheetah-v2 Mujoco Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

trpo_training

Resulting Policy:

trpo_run.mp4