Tic Tac Toe

This repository contains a reinforcement learning algorithm wherein an agent is crafted with several default rewards (e.g. win, lose, and tie). It is trained by playing itself and then allowing the user to interact using a feature-rich CLI.

This 2-day project was inspired by the classic work by Sutton and Barton (2012) particularly the self-play discussion in exercise 1.1. I also referenced @tansey's repository for clarifying some details of a particular implementation.

Installation

Assuming a package manager such as pip, the following dependencies are needed to run the CLI:

pip install urwid urwid_timed_progress tabulate funcy phi pyrsistent numpy

To run, simply call python tic-tac-toe from within the directory.

Demo Recording

Diagram Flow

In case you're a lover of flow charts and sequence diagrams, here's a quick sketch of the program flow from the top-level.

FAQ

Why does the "AI Brain" section seem to show the initial values?

Because the values are very close the original values, they appear to be the same. However, if you look at the actual values, they are different. This distinction is important for the AI as it currently chooses the highest reward for

Where can I get the values that the algorithm produces?

When you run the project, a file is produced called rewards.tsv that includes the weights for the different scenarios encountered during the training process for the agent playing as the X mark.

References

Sutton, R. S., & Barto, A. G. (2012). Reinforcement learning: an introduction (2nd ed.). Cambridge, MA: The MIT Press.

rcdilorenzo/TicTacToe

Tic Tac Toe

Installation

Demo Recording

Diagram Flow

FAQ

References