Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit Problem.
For the more detailed description, See MultiArmedBandit.pdf
This problem is also called Exploitation/Exploration problem.
Exploitation solves the Multi-armed Bandit Problem by exploiting the arm with the highest estimated value with respect to success times and rewards of previous plays.
Exploration solves the Multi-armed Bandit Problem by exploring any arm that does not have the highest estimated value based on previous plays.
The trade-off between exploitation and exploration is also faced in reinforcement learning.
- Real Time Bidding of online advertisement
- Website/app optimization problem
- Other Markov Desion Process Problem
#Algorithm ##Epsilon-Greedy ##SoftMax ##Upper Confidence Bound ##Thompson Sampling
#Simulate Using MCMC and Bernoulli Bandit to test the different performance between four algorithms
class BernoulliBandit(val probabilityArms:Array[Double]) {
def draw(i:Int): Double ={
require( i < probabilityArms.length )
if(Random.nextDouble() > probabilityArms(i))
0.0
else
1.0
}
}
trait BanditAlgorithm {
def selectArm():Int
def update(chosenArm:Int,reward:Double)
def initialize(n_arms:Int)
}
#Experiment We can compare the performance of accuracy, average reward and cumulative reward.
Following, I only show the chart on the performance of cumulative rewards. #Implementation
- Build it yourself
- This project can be built with sbt 0.13. And the breeze package is used for implementing beta distribution.
- For SBT, Add these lines to your SBT project definition:
libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "0.11.2"
)
#Reference Wiki of MultiArmedBandit
Bandit Algorithms for Website Optimization by John Myles White
Analysis of Thompson Sampling for the Multi-armed Bandit Problem by Shipra and Navin
An Information-Theoretic Analysis of Thompson Sampling by Daniel And Benjamin #To Do
- Contextual Bandit
- Stochastic Bandit
- Implementing on spark to support parallel and online learning