About

Reversi reinforcement learning of AlphaGo zero and Alpha Zero methods.

This version of code is verified capable to solve 4x4 Reversi Problem using AlphaGoZero way.
This version of code is verified capable to train a rather strong 8x8 Reversi using AlphaGoZero way, which finally get a NTest 10+ depth level. Detailed evaluation result is recorded here.
This Version of code is verified capable to train a rather strong 8x8 Reversi using AlphaZero way, which finally get a NTest 10+ depth level. Detailed evaluation result is recorded here.

Environment

Ubuntu / OSX. I don't tested it in Windows.
Python 3.6
tensorflow-gpu: 1.3.0+
Keras: 2.0.8

Download Pretrained Model to Play with

Training Pipeline Examples

Below are some examples for usage. Check the codes for detailed usage information (src/reversi_zero/manager.py would be a good entry point).

Self-Play

python3.6 -m src.reversi_zero.run self --env reversi

python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4

python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --gpu-memory-frac 0.8

python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --can-resign False

For AlphaGoZero way, you can also cache Neural Network prediction results in memory, to raise the speed. In my rough test, for reversi, 30,000,000 size cache causes a 25% speed up, and every 10,000,000 size uses 5GB memory. By default this cache is not open; to open it, use --model-cache-size your_size. (Note for AlphaZero way, it doesn't help too much because the model changes so often.)

python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --model-cache-size 10000000

Maintaining resignation threshold

# Required for self play in my enrivonment. Maybe you don't need it.
python3.6 -m src.reversi_zero.run res --env reversi

Trainer

# AlphaZero style
python3.6 -m src.reversi_zero.run opt --env reversi

# AlphaGoZero style
python3.6 -m src.reversi_zero.run opt --env reversi  --need-eval True

Evaluator

python3.6 -m src.reversi_zero.run eval

python3.6 -m src.reversi_zero.run eval --n-workers 4

Play Game Examples

Start Http Server

# use best model
python3.6 -m src.reversi_zero.run http_server --env reversi

# have a chance to select specific generation of model from console
python3.6 -m src.reversi_zero.run http_server --env reversi --ask-model true

# specify simulations per move
python3.6 -m src.reversi_zero.run http_server --env reversi --n-sims 100

# use the specific generation of model
python3.6 -m src.reversi_zero.run http_server --env reversi --n-steps-model 424000

# set the http port
python3.6 -m src.reversi_zero.run http_server --env reversi --http-port 8888

Play GUI

python3.6 -m src.reversi_zero.run play_gui --env reversi

# show local GUI, but the model run on another server.
pythonw -m src.reversi_zero.run play_gui --env reversi --http-url http://192.168.31.9:8888

Play with NTest

NTest is a very strong Reversi AI. We can play with it automatically. Just modify batch.ntest.sh and run.

. ./batch.ntest.sh

Play between different generations of model

Sometimes I want to compete strength of different models. So I setup the models in src/reversi_zero/worker/league.py and run.

python3.6 -m src.reversi_zero.run league --env reversi --n-workers 4

Strength Records

see records.md in this folder.

Credit

My codes are based on @mokemokechicken 's original implementation, which is really great.
My multi-process idea is borrowed from @akababa 's repo.

gooooloo/reversi-alpha-zero