_ _ _ _ _ _ _
| | | (_) | | (_) |
| |__ __ _ _ __ __| |_| |_ _ __ _ _| |_| |__
| '_ \ / _` | '_ \ / _` | | __| '_ \| | | | | | '_ \
| |_) | (_| | | | | (_| | | |_| |_) | |_| | | | |_) |
|_.__/ \__,_|_| |_|\__,_|_|\__| .__/ \__, |_|_|_.__/
| | __/ |
|_| |___/
A lightweight python library for bandit algorithms
This library is intended for fast and robust build of bandit algorithms. Hence it has the following features:
- object-oriented design
- multiprocesses support
- friendly runtime info
The library consists of four components i.e., arms
, bandits
, learners
and protocols
, which are explained in the following:
arms
: a set of arms used to build bandit environmentsbandits
: a set of bandit environmentslearners
: a set of bandit algorithmsprotocols
: a set of protocols which are used to coordinate the interactions between the learner and the bandit environment
Please check the next section for the implemented policies.
Bandit Type | Policies |
---|---|
Ordinary Bandit | Uniform , EpsGreedy , UCB , ThompsonSampling , UCBV , MOSS |
Ordinary MNL Bandit | EpsGreedy , UCB , ThompsonSampling |
Bandit Type | Policies |
---|---|
Ordinary Bandit | Uniform , SR , SH |
Bandit Type | Policies |
---|---|
Ordinary Bandit | ExpGap , LilUCBHeuristic |
For a detailed description, please check the documentation.
Python version requirement: 3.6+.
Virtual environment: in order not to pollute your own environment, it is suggested to use python virtual environment. The following commands show the details to create and activate a virtual environment.
# create a virtual environment `.env`
python -m venv .env
# activate the environment
source .env/bin/activate
Then you can run the following command to install the banditpylib
library. Note that the following command creates a symbolic link to the library which means anything you change to the library will immediately take effect when you re-import the library.
# run under `banditpylib` root directory
pip install -e .
After finishing the usage of the library, you can use deactivate
to deactive the virtual environment and what's more, you can safely delete the whole .env
directory for clean up.
Suppose we want to run algorithms Epsilon Greedy, UCB and Thompson Sampling, which aim to maximize the total rewards, against the ordinary multi-armed bandit environment with 3 Bernoulli arms. The following code blocks show the main logic.
# real means of Bernoulli arms
means = [0.3, 0.5, 0.7]
# create Bernoulli arms
arms = [BernoulliArm(mean) for mean in means]
# create an ordinary multi-armed bandit environment
bandit = OrdinaryBandit(arms=arms)
# horizon of the game
horizon = 2000
# create learners aiming to maximize the total rewards
learners = [EpsGreedy(arm_num=len(arms), horizon=horizon),
UCB(arm_num=len(arms), horizon=horizon),
ThompsonSampling(arm_num=len(arms), horizon=horizon)]
# record intermediate regrets for each trial
intermediate_regrets = list(range(0, horizon+1, 50))
# set up simulator using single-player protocol
game = SinglePlayerProtocol(bandit=bandit,
learners=learners,
intermediate_regrets=intermediate_regrets)
# start playing the game and for each setup we run 200 trials
game.play(trials=200)
The following figure shows the simulation results.
Please check this notebook to figure out more details.
# run all tests
pytest
@misc{BanditPyLib,
title = {{BanditPyLib: a lightweight python library for bandit algorithms}},
author = {Chester Holtz and Chao Tao},
year = {2020},
url = {https://github.com/Alanthink/banditpylib},
howpublished = {Online at: \url{https://github.com/Alanthink/banditpylib}},
note = {Documentation at \url{https://alanthink.github.io/banditpylib-doc}}
}
This project is licensed under the MIT License - see the LICENSE.txt file for details.
- This project is inspired by libbandit and banditlib which are both c++ libraries for bandit algorithms.
- This readme file is following the style of README-Template.md.
- The title is generated by TAAG.