- TicTacToe
- Reversi
- Wuziqi (Gomoku)
- Wuziqi-1swap (五子棋一手交換規則)
- Custom environment with clear APIs
- Examples are in
/games
- Examples are in
- Arbitrary number of agents with per-agent rewards
- run
run_mcts.py
to start - look up
config.py
to change game/configurations
n_iters
: the larger the more clever neural network will be, will increase training time linearly.n_eps
: the larger the more robust the training will be, will increase training time linearlyn_mcts
: the larger the larger the more brute-force search samples will be, will increase training time and testing time polynomially
- add a Q head
- add \alpha for Dirichlet noise
- cyclic learning rate
- PPO for policy
- episodic memory for value
- population based training
- Loss related
- Sampling
- Learning rate
- Meta-train/Hyperparams
- Bandit
- Python 3.8 +
- Refer to
requirement.txt