Chess reinforcement learning by AlphaGo Zero methods.
This project is based in two main resources:
- DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.
- The great Reversi development of the DeepMind ideas that @mokemokechicken did in his repo: https://github.com/mokemokechicken/reversi-alpha-zero
Note: This project is still under construction!!
- Python 3.6.3
- tensorflow-gpu: 1.3.0
- Keras: 2.0.8
I've done a supervised learning new pipeline step (to use those human games files "PGN" we can find in internet as play-data generator). This SL step was also used in the first and original version of AlphaGo and maybe chess is a some complex game that we have to pre-train first the policy model before starting the self-play process (i.e., maybe chess is too much complicated for a self training alone).
To use the new SL process is so simple as running in the beginning instead of the worker "self" the new worker "sl". Once the model converges enough with SL play-data we just stop the worker "sl" and start the worker "self" so the model will start improving now due to self-play data.
If you want to use this new SL step you will have to download from internet big PGN files (chess files) and paste them into the "data/play_data" folder.
python src/chess_zero/run.py sl
Now it's possible to train the model in a distributed way. The only thing needed is to use the new parameter:
--type distributed
: use mini config for testing, (seesrc/chess_zero/configs/distributed.py
)
So, in order to contribute to the distributed team you just need to run the three workers locally like this:
python src/chess_zero/run.py self --type distributed (or python src/chess_zero/run.py sl --type distributed)
python src/chess_zero/run.py opt --type distributed
python src/chess_zero/run.py eval --type distributed
This AlphaGo Zero implementation consists of three worker self
, opt
and eval
.
self
is Self-Play to generate training data by self-play using BestModel.opt
is Trainer to train model, and generate next-generation models.eval
is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.
For evaluation, you can play chess with the BestModel.
play_gui
is Play Game vs BestModel using ASCII character encoding.
data/model/model_best_*
: BestModel.data/model/next_generation/*
: next-generation models.data/play_data/play_*.json
: generated training data.logs/main.log
: log file.
If you want to train the model from the beginning, delete the above directories.
pip install -r requirements.txt
If you want use GPU,
pip install tensorflow-gpu
Create .env
file and write this.
KERAS_BACKEND=tensorflow
For training model, execute Self-Play
, Trainer
and Evaluator
.
python src/chess_zero/run.py self
When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.
--new
: create new BestModel--type mini
: use mini config for testing, (seesrc/chess_zero/configs/mini.py
)
python src/chess_zero/run.py opt
When executed, Training will start. A base model will be loaded from latest saved next-generation model. If not existed, BestModel is used. Trained model will be saved every 2000 steps(mini-batch) after epoch.
--type mini
: use mini config for testing, (seesrc/chess_zero/configs/mini.py
)--total-step
: specify total step(mini-batch) numbers. The total step affects learning rate of training.
python src/chess_zero/run.py eval
When executed, Evaluation will start. It evaluates BestModel and the latest next-generation model by playing about 200 games. If next-generation model wins, it becomes BestModel.
--type mini
: use mini config for testing, (seesrc/chess_zero/configs/mini.py
)
python src/chess_zero/run.py play_gui
When executed, ordinary chess board will be displayed in ASCII code and you can play against BestModel.
Usually the lack of memory cause warnings, not error.
If error happens, try to change per_process_gpu_memory_fraction
in src/worker/{evaluate.py,optimize.py,self_play.py}
,
tf_util.set_session_config(per_process_gpu_memory_fraction=0.2)
Less batch_size will reduce memory usage of opt
.
Try to change TrainerConfig#batch_size
in NormalConfig
.
The following table is records of the best models.
best model generation | winning percentage to best model | Time Spent(hours) | note |
---|---|---|---|
1 | - | - |