/MultiArmedBandit

Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit problem

Primary LanguageScala

MultiArmedBandit

Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit Problem.

For the more detailed description, See MultiArmedBandit.pdf

Definition

This problem is also called Exploitation/Exploration problem.

Exploitation solves the Multi-armed Bandit Problem by exploiting the arm with the highest estimated value with respect to success times and rewards of previous plays.

Exploration solves the Multi-armed Bandit Problem by exploring any arm that does not have the highest estimated value based on previous plays.

The trade-off between exploitation and exploration is also faced in reinforcement learning.

Application

  • Real Time Bidding of online advertisement
  • Website/app optimization problem
  • Other Markov Desion Process Problem

#Algorithm ##Epsilon-Greedy image ##SoftMax image ##Upper Confidence Bound image ##Thompson Sampling image

#Simulate Using MCMC and Bernoulli Bandit to test the different performance between four algorithms

class BernoulliBandit(val probabilityArms:Array[Double]) {
  def draw(i:Int): Double ={
    require( i < probabilityArms.length )
    if(Random.nextDouble() > probabilityArms(i))
      0.0
    else
      1.0
  }
}
trait BanditAlgorithm {

  def selectArm():Int
  def update(chosenArm:Int,reward:Double)
  def initialize(n_arms:Int)

}

#Experiment We can compare the performance of accuracy, average reward and cumulative reward.

Following, I only show the chart on the performance of cumulative rewards. image #Implementation

  • Build it yourself
  • This project can be built with sbt 0.13. And the breeze package is used for implementing beta distribution.
  • For SBT, Add these lines to your SBT project definition:
libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "0.11.2"
)

#Reference Wiki of MultiArmedBandit

Bandit Algorithms for Website Optimization by John Myles White

Analysis of Thompson Sampling for the Multi-armed Bandit Problem by Shipra and Navin

An Information-Theoretic Analysis of Thompson Sampling by Daniel And Benjamin #To Do

  • Contextual Bandit
  • Stochastic Bandit
  • Implementing on spark to support parallel and online learning