rl Some experiments with RL, using tensorflow: cart.py -- vanilla policy gradient learning + entropy regularisation for openAI Gym's cartpole problem