Use Monte Carlo policy gradient method without baseline to find optimal policy of playing
cartpole.
Go to root directory of this repo
$ python main.py
if you want to check further results of training, use tensorboard
$ tensorboard --logdir=/tmp/tensorflow_logs