Reversi reinforcement learning of AlphaGo zero and Alpha Zero methods.
- This version of code is verified capable to solve 4x4 Reversi Problem using AlphaGoZero way.
- This version of code is verified capable to train a rather strong 8x8 Reversi using AlphaGoZero way, which finally get a NTest 10+ depth level. Detailed evaluation result is recorded here.
- This Version of code is verified capable to train a rather strong 8x8 Reversi using AlphaZero way, which finally get a NTest 10+ depth level. Detailed evaluation result is recorded here.
- Ubuntu / OSX. I don't tested it in Windows.
- Python 3.6
- tensorflow-gpu: 1.3.0+
- Keras: 2.0.8
Below are some examples for usage. Check the codes for detailed usage information (src/reversi_zero/manager.py
would be a good entry point).
python3.6 -m src.reversi_zero.run self --env reversi
python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4
python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --gpu-memory-frac 0.8
python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --can-resign False
For AlphaGoZero way, you can also cache Neural Network prediction results in memory, to raise the speed. In my rough test, for reversi, 30,000,000 size cache causes a 25% speed up, and every 10,000,000 size uses 5GB memory. By default this cache is not open; to open it, use --model-cache-size your_size
. (Note for AlphaZero way, it doesn't help too much because the model changes so often.)
python3.6 -m src.reversi_zero.run self --env reversi --n-workers 4 --model-cache-size 10000000
# Required for self play in my enrivonment. Maybe you don't need it.
python3.6 -m src.reversi_zero.run res --env reversi
# AlphaZero style
python3.6 -m src.reversi_zero.run opt --env reversi
# AlphaGoZero style
python3.6 -m src.reversi_zero.run opt --env reversi --need-eval True
python3.6 -m src.reversi_zero.run eval
python3.6 -m src.reversi_zero.run eval --n-workers 4
# use best model
python3.6 -m src.reversi_zero.run http_server --env reversi
# have a chance to select specific generation of model from console
python3.6 -m src.reversi_zero.run http_server --env reversi --ask-model true
# specify simulations per move
python3.6 -m src.reversi_zero.run http_server --env reversi --n-sims 100
# use the specific generation of model
python3.6 -m src.reversi_zero.run http_server --env reversi --n-steps-model 424000
# set the http port
python3.6 -m src.reversi_zero.run http_server --env reversi --http-port 8888
python3.6 -m src.reversi_zero.run play_gui --env reversi
# show local GUI, but the model run on another server.
pythonw -m src.reversi_zero.run play_gui --env reversi --http-url http://192.168.31.9:8888
NTest is a very strong Reversi AI. We can play with it automatically. Just modify batch.ntest.sh
and run.
. ./batch.ntest.sh
Sometimes I want to compete strength of different models. So I setup the models in src/reversi_zero/worker/league.py
and run.
python3.6 -m src.reversi_zero.run league --env reversi --n-workers 4
see records.md in this folder.
- My codes are based on @mokemokechicken 's original implementation, which is really great.
- My multi-process idea is borrowed from @akababa 's repo.