alphacc_zero: A Python repository from haodong2000

AlphaCC Zero: A Deep Reinforcement Learning Model for Chinese Chess (SRTP)

Designed a reinforcement learning algorithm based on AlphaGo Zero for Chinese chess playing
Optimized the original AlphaGo Zero algorithm by 1) updating the data structure for Chinese chess games, 2) cutting off some branches of the self-play decision tree to reach deeper learning, 3) enriching the reward and punishment standards, and 4) finding a well-performed combination of different value and decision networks
Enabled the new algorithm to surpass most human chess players
Demos for the entire SRTP, which included 3 studies, with the chess_simulator, and chess_vision.
- YouTube Viedo Link
- Project Website

Environment & Usage

Environment

Python 3.6.13
cuda 11.2
tensorflow-gpu 2.6.2
pillow 8.4.0
scipy
uvloop

Usage

# Just play
python main.py --mode play --processor gpu --num_gpus 1 --ai_function mcts --ai_count 1 

# Multiple processes train
python main.py --mode distributed_train --processor gpu --train_playout 400 --res_block_nums 9 --train_epoch 100 --batch_size 256 --mcts_num 8

# Evaluate (Compute elo)
python main.py --mode eval --processor gpu --play_playout 40 --res_block_nums 9 --eval_num 1 --game_num 10

Algorithm

Self-Play & Network Training

Real Play (Net + MCTS)

Network Structure

Reference

[1] Silver, David, Schrittwieser, Julian, Simonyan, Karen, Antonoglou, Ioannis, Huang, Aja, Guez, Arthur, Hubert, Thomas, Baker, Lucas, Lai, Matthew, Bolton, Adrian, Chen, Yutian, Lillicrap, Timothy, Hui, Fan, Sifre, Laurent, van den Driessche, George, Graepel, Thore and Hassabis, Demis. "Mastering the game of Go without human knowledge." Nature 550 (2017): 354--.

[2] https://github.com/chengstone/cchess-zero

haodong2000/alphacc_zero

AlphaCC Zero: A Deep Reinforcement Learning Model for Chinese Chess (SRTP)

Environment & Usage

Algorithm

Reference