Implementation of MuZero for Tic-Tac-Toe.
It can play optimally 65-70% of the times if you train long enough and if you are lucky.
RL is hard (ToT)
git clone https://github.com/souvikshanku/tic-tac-toe-zero.git
cd tic-tac-toe-zero
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt
# End-to-end training
python3 self_play.py
# Play against 'random' agent
python3 check_accuracy.py