/bandits

Primary LanguageJupyter Notebook

A notebook exemplifying Thompson Sampling

Thompson Sampling is an algorithm for learning the best reward-carrying action from a set of such actions to choose from, where the reward from each action is a random variable. For a mathematical delineation of the optimality characteristics of this algorithm, I recommend reading here. There are many introductory posts on Medium on the topic, for example the following ones which greatly overlap: (1, 2, 3).

Why this repo?

Comparing to the equivalent code and article that inspired it, this code may tentatively be more conducive to further experimentation of intriguing variants.

Running on google colaboratory

Open In Colab

Running locally

To run it locally after cloning the repo, perhaps with different parameters or other enticing changes, you need to have its python dependencies available in your python environment. Using a custom virtual environment with Anaconda, I have exported the environment definition and you can pull it together after cloning this repo, e.g. via:

conda env create --name mab -f environment.yml

You may replace the name mab with any name you'd like to have for this virtual environment. Of course, this assumes you have Anaconda installed. You'd then activate the environment on your end in the usual way, the syntax subtly varies depending on your OS.

Alternatively use the requirements.txt file for setting up directly with pip3.

Or, just install whatever packages that come up as import errors, as my environment has many items unnecessary for running this project.