/tictactoe3d-deep-rl

Applying reinforcement learning and deep reinforcement learning to Tic Tac Toe and Tic Tac Toe 3D respectively.

Primary LanguageJupyter Notebook

Final Project: Reinforcement Learning for Tic Tac Toe

This is a course project submission for the course: Dynamic Programming and Reinforcement Learning (CS 6314) at Lahore University of Management Sciences.

For further implementation details, please refer to the project report: Click Here

Overview

Phase 1: 2D Tic Tac Toe (3x3 Grid) - Value Iteration Implementation

  • Objective: Build an AI agent using value iteration to play 2D Tic Tac Toe on a 3x3 grid.

Phase 2: 2D Tic Tac Toe (4x4 Grid) - Q Learning Implementation

  • Objective: Construct an AI agent using Q learning to play 2D Tic Tac Toe on a 4x4 grid.

Phase 3: 3D Tic Tac Toe (4x4x4 Grid) - Open-Ended AI Design Competition

  • Objective: Develop an AI model for 3D Tic Tac Toe on a 4x4x4 grid, employing various approaches to create an optimal policy.
  • The model used was an approximate version of Temporal Difference Learning, known as Neural Temporal Difference Learning.

Training

  • To train each model, please refer to the notebooks in the code section of the repository.

Pre-trained Models

  • For phase 1, the policy is available as a .pkl file. The code to load and use it is available in the notebook.
  • For phase 2, the Q table is available as a .pkl file. The code to load and use it is available in the notebook.
  • For phase 3, the latest trained model is available as a .pth file. The code to load and use it is available in the notebook.

Results

Phase 1

Evaluating the policies against a random opponent, our results were:

  • Wins as X: 99.5%
  • Draws as X: 0.5%
  • Wins as O: 80.7%
  • Draws as O: 19.3%

Phase 2

The results we obtained (against a random player) were:

  • Wins as O: 54.7%
  • Draws as O: 24.9%
  • Wins as X: 28.2%
  • Draws as X: 39.8%

Phase 3

When played against the online version of the game Tic Tac Toe 3D, our agent was able to defeat the Easy bot but not the difficult bot. With enough training, it seems to be possible to defeat the latter as well.

Credits