/banditlib

Multi-armed bandit simulation library

Primary LanguageC++MIT LicenseMIT

BanditLib: a simple multi-armed bandit library 

.-. .-')     ('-.         .-') _  _ .-') _            .-') _                   .-. .-')   
\  ( OO )   ( OO ).-.    ( OO ) )( (  OO) )          (  OO) )                  \  ( OO )  
;-----.\   / . --. /,--./ ,--,'  \     .'_   ,-.-') /     '._ ,--.      ,-.-') ;-----.\  
| .-.  |   | \-.  \ |   \ |  |\  ,`'--..._)  |  |OO)|'--...__)|  |.-')  |  |OO)| .-.  |  
| '-' /_).-'-'  |  ||    \|  | ) |  |  \  '  |  |  \'--.  .--'|  | OO ) |  |  \| '-' /_) 
| .-. `.  \| |_.'  ||  .     |/  |  |   ' |  |  |(_/   |  |   |  |`-' | |  |(_/| .-. `.  
| |  \  |  |  .-.  ||  |\    |   |  |   / : ,|  |_.'   |  |  (|  '---.',|  |_.'| |  \  | 
| '--'  /  |  | |  ||  | \   |   |  '--'  /(_|  |      |  |   |      |(_|  |   | '--'  / 
`------'   `--' `--'`--'  `--'   `-------'   `--'      `--'   `------'  `--'   `------'  

1. About
2. Environment
3. Quick run
4. Misc


1. About

This is a C++ package for multi-armed bandit simulations. This package is designed to be

  1. Simple : easy to understand and extend, but not optimized for speed.
  2. Independent : does not require external library.
  • Arms:
  • Binary and Normal distribution of rewards (arms) are implemented.
  • Policies:
  • DMED for binary rewards [1]
  • Epsilon-Greedy
  • KL-UCB [2]
  • MOSS [3]
  • Thompson sampling for binary rewards [4]
  • UCB [5]
  • UCB-V [6]

2. Environment

This program supports a linux/GNU C++ environment. We do not check windows/MacOSX.

More formally, this program depends on:

  • C++0x: modern C++ compiler (preferably GNU C++ (g++))
  • waf (included) [7]: build script
  • cmdline.h (included) [8]: command line parser

3. Quick run

Type

./compile
./build/main -r 10

to run 10 simulation runs. The result of the runs will be written in out/example1.txt

This package also includes a simple plot tool (simpleplot.py) that is dependent on Python/Matplotlib. If your environment is g++/Python ready, try

./example.sh

4. Misc

The implementation of the beta distribution sampler is from [9]. The logo was generated by using [10].

References

[1] J. Honda, A. Takemura: An asymptotically optimal policy for finite support models in the multiarmed bandit problem.  Machine Learning 85(3) 2011, p.361-391

[2] Aurélien Garivier, Olivier Cappé: The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. COLT 2011: 359-376

[3] J-Y. Audibert and S. Bubeck: Minimax Policies for Adversarial and Stochastic Bandits.  Proceedings of the 22nd Annual Conference on Learning Theory 2009

[4] Thompson, William R: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3–4):285–294, 1933

[5] Peter Auer, Nicolò Cesa-Bianchi and Paul Fische: Finite-time analysis of the multiarmed bandit problem.  Machine Learning 47 2002 p.235-256

[6] J.-Y. Audibert, R. Munos, Cs. Szepesvári: Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science Volume 410 Issue 19 Apr. 2009 pp. 1876-1902

[7] Waf - The meta build system: https://code.google.com/p/waf/

[8] Hideyuki Tanaka: cmdline https://github.com/tanakh/cmdline

[9] Joseph Mansfield: A comment on stackoverflow http://stackoverflow.com/questions/15165202/random-number-generat

[10] Text to Ascii Art Maker: http://patorjk.com/software/taag/

##Author Junpei Komiyama (junpei.komiyama atmark gmail.com)

This software is released under the MIT License, see LICENSE.txt.