CartPole policy gradient

Use Monte Carlo policy gradient method without baseline to find optimal policy of playing
cartpole.

Dependencies

Go to root directory of this repo

$ python main.py

if you want to check further results of training, use tensorboard

$ tensorboard --logdir=/tmp/tensorflow_logs