PolicyGradients
Pytorch single thread implementation of REINFORCE, Trust Region Policy Optimization (TRPO) & Proximal Policy Optimization (PPO).
LunarLander-v2
Acrobot-v1
CartPole-v0
Train
python train.py \
--algo TRPO \
--seeds 10 20 30 \
--env_name LunarLander-v2 \
Test
python test.py \
--algo PPO \
--env_name Acrobot-v1 \
--seed 10
Plot
python plot.py \
--algos REINFORCE PPO TRPO \
--env_name LunarLander-v2