grassking100/reinforcement_learning

An implementation of the reinforcement learning for CartPole-v0 by policy optimization

Python

An implementation of the reinforcement learning for CartPole-v0 by policy optimization

The step plot of the result

The histogram of the 100 simulation result (mean value 199)

Reference

CartPole-v0: https://gym.openai.com/envs/CartPole-v0/
Ilyas, Andrew, et al. "A closer look at deep policy gradients." arXiv preprint arXiv:1811.02553 (2018).