Neurohex: A Python repository from kenjyoung

Neurohex uses Deep Q-learning with a convolutional neural network to learn to play the game of hex through self-play. It is written in python using theano and some lasagne.

mentor.py contains a script which preforms supervised learning over a prescored dataset in order to give the network a decent initialization before q_learning.

q_learn.py contains a script which trains a network by deep Q-learning through self play.

In order to run both these learning scripts it is nessesary to first genenerate the file data/scoredPositionsFull.npz. This can be done by running the scoreDataSet.py script, this takes around 3 hours and the resulting file is around 1.7GB. Alternatively the file can be copied from this link (https://drive.google.com/file/d/0BwzU100XQElSOUgzVGd2Yy1HRkk/view?usp=sharing), it is too large to upload to github.

The directory playerAgents contains program.py which is an executable hex agent that makes use of a trained network which is also included. The included network is inspired by the arcitecture of the value network of Google DeepMind's alphaGo. It was trained by first mentoring a version of a common hex heuristic based on electircal resistance over a dataset generated by a strong hexplayer called wolve (see https://sourceforge.net/projects/benzene/) and then training by selfplay. program.py communicates using the gtp-protocol (https://www.lysator.liu.se/~gunnar/gtp/) and can be played against using an interface like hexgui (https://github.com/ryanbhayward/hexgui), or simply by typing gtp commands via command line. Plays best on 13x13 as this is all it was trained for, however it should now be able to play on any boardsize up to 13x13 where it will simply see the extra cells as filled in appropriately.

The default player agent now uses the network to preform a tree search similar to MCTS but with network evaluations in place of rollouts. The command "agent" can also be used to toggle between other available agents including one using the network alone and one (very weak) agent using the resistance heuristic used for the supervised mentoring part of the training.

To use the code it is nessesary to install numpy and theano.

A paper on Neurohex can be found here: http://arxiv.org/abs/1604.07097

LISCENSE
========
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

kenjyoung/Neurohex