/tictactoe-reinforcement-learning

Using Tabular RL, Value Iteration to train a tic-tac-toe agent

Primary LanguagePythonApache License 2.0Apache-2.0

Tabular Reinforcement Learning applied on Tic Tac Toe

Applying valute iteration and MDP to to teach a reinforcement learning agent playing tic-tac-toe. The code is written in Python from scratch, and the policy is near-optimal.

The memory folder contains initialized and re-evaluated state-value pairs. (load using Pickle)

image

Guidelines:

Step 1. Extract all the possible states and initialize their values:

python3 state_extractor.py

Step 2. Run value iteration over all the states until convergence:

python3 value_iterator.py

Step 3. Several ways to check the policy

  • Method I: AI plays against a randomly playing agent:
python3 markov_eval.py
  • Method II: AI plays against itself:
python3 against_itself.py
  • Method III: AI plays against human:
python3 human.py