Tic Tac Transformer

A tiny GPT trained to play tic-tac-toe

How does it work?

We teach a language model to speak tic-tac-toe

The language is simple - there are 11 tokens

The sequence length is 10, so a game always starts with <9> and can at most fill the board

Players take turns

Duplicate moves are illegal

Example

seq: [9, 4, 0, 2, 1, 6, 10, 10, 10, 10]

[O] [O] [X]
[ ] [X] [ ]
[X] [ ] [ ]

player 1 wins

Play the AI!

python play_ai.py

Generate pre-training data

python generate_data.py

Run pre-training

python train.py

RL fine-tuning

python reinforcement_learn.py

Run benchmark

python benchmark.py