Using policy gradients with REINFORCE to learn: Tic-Tac-Toe Sources: Karpathy's RL blog (http://karpathy.github.io/2016/05/31/rl/) https://github.com/yukezhu/tensorflow-reinforce/blob/master/rl/pg_reinforce.py