TicTacToeRL A simple implementation of Q-learning for Tic Tac Toe. Q-learning parameters: Alpha = 0.9, Learning rate Gamma = 1.0, Discount rate Epsilon = 0.8, Probability for random moves Converges in approximately 50 000 training games.