This is a tutorial written for Caffe2 which mocks google AlphaGo Fan and AlphaGO Zero. v0.2.0 is released, with ResNet based AlphaGo Zero model.
This program by so far relies on RocAlphaGo Cython implementation for feature preprocessing and Go rules. Cython compilation can be done by running shell command python setup.py build_ext --inplace
.
The Go game dataset are usually stored in SGF file format. We need to transform SGF file into Caffe2 Tensor. AlphaGo Zero requires 17 feature planes of 19x19 size, which does not include 'human knowledge' like Liberties or Escape.
This preprocess program still relies on RocAlphaGo for Go rules, but no more dependencies for feature generation. I'm looking for a better(more accurate) Go rule implementation which can support Chinese/Korean/Japanese Go rules and different Komi, please feel free to recommend.
The Supervised Learning program is used to evaluate whether the network architecture is correct. Due to a bug in Caffe2 spatial_BN op, the program cannot resume from previous run. Since each epoch requires 200~250 GPU hours, thus it's not viable to run it on personal computer.
epochs | LR | loss | train/test accu | epochs | LR | loss | train/test accu |
---|---|---|---|---|---|---|---|
0.2 | 0.1 | - | - / 0.1698 | 11 | / | ||
0.4 | / | 12 | / | ||||
0.6 | / | 13 | / | ||||
0.8 | / | 14 | / | ||||
1 | / | 15 | / | ||||
6 | / | 16 | / | ||||
7 | / | 17 | / | ||||
8 | / | 18 | / | ||||
9 | / | 19 | / | ||||
10 | / | * | 0.60/0.57(alphago zero) |
On going. This will be different from AlphaGo Fan in may ways:
1. Always use the best primary player to generate data.
2. Before each move, do wide search to obtain better distribution than Policy predict.
3. MCTS only relies on Policy and Value network, no more Rollout.
4. more detail will be added during implementation
The Go game dataset are usually stored in SGF file format. We need to transform SGF file into Caffe2 Tensor which are 48 feature planes of 19x19 size, according to DeepMind.
The preprocess program relies on Cython
implementation of RocAlphaGo project for Go rules and feature plane generation. It is estimated to take 60 CPU hours for preprocess complete KGS data set.
According to DeepMind, AlphaGo can achieve 55.4% test accuracy after 20 epochs training. Test set is the first 1 million steps. i.e. KGS2004. The speed of each prediction is 4.8ms (on Kepler K40 GPU).
This program achieves 52.83% by 11 epochs so far. Test set is the latest 1M steps. i.e.KGS201705-KGS201709. It also achieved speed of around 4.5ms for each single prediction (on Maxwell GTX980m GPU). Therefore each epochs takes ~40 GPU hours. Running on GPU mode is around 100x faster than CPU mode.
epochs | LR | loss | train/test accu | epochs | LR | loss | train/test accu |
---|---|---|---|---|---|---|---|
1 | 0.003 | 1.895 | 0.4800 / 0.4724 | 11 | 0.0002 | 1.5680 | 0.5416 / 0.5283 |
2 | 0.003 | 1.7782 | 0.5024 / 0.4912 | 12 | 0.0001 | 1.5639 | 0.5424 / 0.5291 |
3 | 0.002 | 1.7110 | 0.5157 / 0.5029 | 13 | / | ||
4 | 0.002 | 1.6803 | 0.5217 / 0.5079 | 14 | / | ||
5 | 0.002 | 1.6567 | - / 0.5119 | 15 | / | ||
6 | 0.002 | 1.6376 | 0.5302 / 0.5146 | 16 | / | ||
7 | 0.001 | 1.6022 | 0.5377 / 0.5202 | 17 | / | ||
8 | 0.0005 | 1.5782 | - / 0.5273 | 18 | / | ||
9 | 0.0005 | 1.6039 | 0.5450 / 0.5261 | 19 | / | ||
10 | 0.0002 | 1.5697 | 0.5447 / 0.5281 | 20 | 0.569/0.554(alphago) |
The training accuracy record of epoch 5/8 were lost.
Intel Broadwell CPU can provide around 30 GFlops compute power per core. Nvidia Kepler K40 and Maxwell GTX980m GPU can provide around 3 TFlops compute power.
The RL program is runnable now but still under evaluation. It also relies on RocAlphaGo project for Go rules by now. A new program is under construction to implement first 12 features in GPU mode to replace RocAlphaGo. It is believed to be at least 10x faster than RocAlphaGo(python implementation).
tbd. Depends on Reinforced Learning to generate 30 millions games. And pick 1 state of each game.
tbd. AlphaGo achieved 24.2% of accuracy and 2us of speed.
tbd. Depends on Fast Rollout.