/reversi-alpha-zero

Reversi reinforcement learning by AlphaGo Zero methods.

Primary LanguagePythonMIT LicenseMIT

About

Reversi reinforcement learning of AlphaGo zero and Alpha Zero methods.

  • This version of code is verified capable to solve 4x4 Reversi Problem using AlphaGoZero way.
  • This version of code is verified capable to train a rather strong 8x8 Reversi using AlphaGoZero way, which finally get a NTest 10+ depth level. Detailed evaluation result is recorded here.
  • This Version of code is verified capable to train a rather strong 8x8 Reversi using AlphaZero way, which finally get a NTest 10+ depth level. Detailed evaluation result is recorded here.

Environment

  • Ubuntu / OSX. I don't tested it in Windows.
  • Python 3.6
  • tensorflow-gpu: 1.3.0+
  • Keras: 2.0.8

Download Pretrained Model to Play with

Here

Training Pipeline Examples

Below are some examples for usage. Check the codes for detailed usage information (src/reversi_zero/manager.py would be a good entry point).

Self-Play

python3.6 -m src.reversi_zero.run self --env reversi
python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 
python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --gpu-memory-frac 0.8
python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --can-resign False

For AlphaGoZero way, you can also cache Neural Network prediction results in memory, to raise the speed. In my rough test, for reversi, 30,000,000 size cache causes a 25% speed up, and every 10,000,000 size uses 5GB memory. By default this cache is not open; to open it, use --model-cache-size your_size. (Note for AlphaZero way, it doesn't help too much because the model changes so often.)

python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --model-cache-size 10000000

Maintaining resignation threshold

# Required for self play in my enrivonment. Maybe you don't need it.
python3.6 -m src.reversi_zero.run res --env reversi

Trainer

# AlphaZero style
python3.6 -m src.reversi_zero.run opt --env reversi
# AlphaGoZero style
python3.6 -m src.reversi_zero.run opt --env reversi  --need-eval True

Evaluator

python3.6 -m src.reversi_zero.run eval
python3.6 -m src.reversi_zero.run eval --n-workers 4

Play Game Examples

Start Http Server

# use best model
python3.6 -m src.reversi_zero.run http_server --env reversi
# have a chance to select specific generation of model from console
python3.6 -m src.reversi_zero.run http_server --env reversi --ask-model true
# specify simulations per move
python3.6 -m src.reversi_zero.run http_server --env reversi --n-sims 100
# use the specific generation of model
python3.6 -m src.reversi_zero.run http_server --env reversi --n-steps-model 424000
# set the http port
python3.6 -m src.reversi_zero.run http_server --env reversi --http-port 8888

Play GUI

python3.6 -m src.reversi_zero.run play_gui --env reversi
# show local GUI, but the model run on another server.
pythonw -m src.reversi_zero.run play_gui --env reversi --http-url http://192.168.31.9:8888

Play with NTest

NTest is a very strong Reversi AI. We can play with it automatically. Just modify batch.ntest.sh and run.

. ./batch.ntest.sh

Play between different generations of model

Sometimes I want to compete strength of different models. So I setup the models in src/reversi_zero/worker/league.py and run.

python3.6 -m src.reversi_zero.run league --env reversi --n-workers 4

Strength Records

see records.md in this folder.

Credit

  • My codes are based on @mokemokechicken 's original implementation, which is really great.
  • My multi-process idea is borrowed from @akababa 's repo.