Holmgång is a collection of scripts for evaluation of backgammon Match Equity Tables.
It allows you to run a large amount of matches between two GNU Backgammon bots, each playing according to their own MET.
With the help of match outcomes and metrics based on those it is possible to reach a verdict about which of the two METs is superior.
Holmgång runs 3 instances of GNU Backgammon: one server which hosts the match (this is basically the table itself), and takes care of generating dice roll sequences, and two clients which connect to the table as external players (using assigned ports on localhost).
Holmgång communicates with these instances through standard input/output - it basically types commands to set up, start and log matches.
First you have to create a directory where the logs of the matches will be saved.
A config.yml
file containing the simulations parameters has to be created in your directory. For a valid example see the example_setups
directory.
Then you can start running the simulations by issuing the following command from the root directory of Holmgång:
bash holmgang_linux_launcher.sh path/to/your/directory/
config_version
(integer) - current value: 1match_indices
- defines the range of matches to be run: the match index is saved into the match log's filenamestart
(integer) - the match index of the first match to runend
(integer) - the match index of the last match to run
superseed
(32-bit integer) - used to generate pseudo-random seeds for individual matches: two matches with identical superseed and match index will use the same dice sequencerematches
(true/false)match_length
(positive integer)met_dir
(string) - path to the directory where XMLs of METs are storedclient_0
&client_1
- both have the following subfields that describe a playername
(string) - the name of the player which will be saved to match logsmet_file
(string) - XML filename (with extension)cube-ply
(integer) - set how many plies to look ahead for cube decisionschecker-ply
(integer) - sets how many plies to look ahead for checker playport
(16-bit integer) - the port which the external player will use to connect to the table
In the printout you will see a count of matches won for both players.
There is a third count for ties. A match is considered drawn if the scores reach either '1 away - 1 away' or '2 away - 2 away' anytime during the match, and does not contribute to the count of wins for any of the players, regardless of the match's actual outcome.
In case you run rematches, a one-sided t-test will be run to check whether one of the METs is playing significantly stronger than the other.
The results of matchpairs (which consist of a match and its rematch with the same dice sequence, but roles reversed) are displayed as a cross table. Rows correspond to results of first leg match, columns to results of second leg match. Abbreviations B, W and T stand for Black, White and Tie, respectively.
Every matchpair provides a score in range of -2 and 2 depending on the results of the two matches in the matchpair (the sum of the scores of the two matches, where every win, draw and loss is worth 1, 0, and -1 point respectively). The t-statistic is calculated on the sample of these scores. p-value is then calculated based on that and the degrees of freedom. Finally a verdict is printed that uses 0.05 as a threshold for significance.
(The example below should match the output of the first leg for the example config provided)
leg score:
Woolsey 0-ply (White / player 0 / on top): 3
Zadeh 0-ply (Black / player 1 / bottom): 7
Ties (scores 1a1a & 2a2a): 0
Total: 3.0 - 7.0
(The example below should match the output of the t-test for the example config provided)
H0: aggregated scores' average is 0
Matchpair outcomes frequency table:
B T W
B 4 2 1
T 0 0 0
W 1 0 2
t-statistics: -0.6123724356957945
p-value: 0.5554454421055857
H0 cannot be rejected ~ none of the METs is significantly stronger than the other
Note that the sum of rows in the crosstable equal 7, 0 and 3, matching the scores of the first leg in the previous output.
Holmgång has only been tested on Ubuntu 18.04.
It needs Python3, Bash and GNU Backgammon installed.
As of March 2020, GNU Backgammon has to be manually compiled, as there is not yet any official release which contains necessary fixes introduced with the following revisions:
/gnubg/external.c
1.101/gnubg/play.c
1.463/gnubg/external.h
1.25/gnubg/drawboard.c
1.82