Thompson Sampling is an algorithm for learning the best reward-carrying action from a set of such actions to choose from, where the reward from each action is a random variable. For a mathematical delineation of the optimality characteristics of this algorithm, I recommend reading here. There are many introductory posts on Medium on the topic, for example the following ones which greatly overlap: (1, 2, 3).
Comparing to the equivalent code and article that inspired it, this code may tentatively be more conducive to further experimentation of intriguing variants.
To run it locally after cloning the repo, perhaps with different parameters or other enticing changes, you need to have its python dependencies available in your python environment. Using a custom virtual environment with Anaconda, I have exported the environment definition and you can pull it together after cloning this repo, e.g. via:
conda env create --name mab -f environment.yml
You may replace the name mab
with any name you'd like to have for this virtual environment.
Of course, this assumes you have Anaconda installed. You'd then activate the environment on your end in the usual way, the syntax subtly varies depending on your OS.
Alternatively use the requirements.txt
file for setting up directly with pip3.
Or, just install whatever packages that come up as import errors, as my environment has many items unnecessary for running this project.