cwhy/mcts_python

Python

MCTS (actually PUCT) in Python

Demo Games

TicTacToe
Reversi
Wuziqi (Gomoku)
Wuziqi-1swap (五子棋一手交換規則)

Supports

Custom environment with clear APIs
- Examples are in /games
Arbitrary number of agents with per-agent rewards

Instructions

run run_mcts.py to start
look up config.py to change game/configurations

Key parameters of MCTS(PUCT)

n_iters: the larger the more clever neural network will be, will increase training time linearly.
n_eps: the larger the more robust the training will be, will increase training time linearly
n_mcts: the larger the larger the more brute-force search samples will be, will increase training time and testing time polynomially

Possible Improvements

add a Q head
add \alpha for Dirichlet noise
cyclic learning rate
PPO for policy
episodic memory for value
population based training

Interesting Literature:

Loss related
Sampling
- Position Averaging
Learning rate
- Cyclic Learning Rate
Meta-train/Hyperparams
- Population-based tuning
Bandit
- Leena strategy
- Thompson Sampling

Requirement

Python 3.8 +
Refer to requirement.txt