/relax_vpg_example

Example VPG implementation with ReLAx

Primary LanguageJupyter Notebook

Example VPG implementation with ReLAx

This repository contains an implementation of vanilla policy gradient (VPG) with ReLAx.

VPG actor was trained on LunarLander-v2 Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

vpg_training

Resulting Policy:

vpg_run.mp4