PolicyGradients

Pytorch single thread implementation of REINFORCE, Trust Region Policy Optimization (TRPO) & Proximal Policy Optimization (PPO).

LunarLander-v2

LunarLander gif comparison LunarLander

Acrobot-v1

Acrobot-v1 comparison Acrobot

CartPole-v0

CartPole-v0 comparison CartPole

Train

python train.py \
  --algo TRPO \
  --seeds 10 20 30 \
  --env_name LunarLander-v2 \

Test

python test.py \
  --algo PPO \
  --env_name Acrobot-v1 \
  --seed 10

Plot

python plot.py \
  --algos REINFORCE PPO TRPO \
  --env_name LunarLander-v2

References