CartPole policy gradient

Use Monte Carlo policy gradient method without baseline to find optimal policy of playing
cartpole.

Dependencies

numpy
tensorflow
gym

How to Run

Go to root directory of this repo

$ python main.py

if you want to check further results of training, use tensorboard

$ tensorboard --logdir=/tmp/tensorflow_logs