MultiArmedBandit

Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit Problem.

For the more detailed description, See MultiArmedBandit.pdf

Definition

This problem is also called Exploitation/Exploration problem.

Exploitation solves the Multi-armed Bandit Problem by exploiting the arm with the highest estimated value with respect to success times and rewards of previous plays.

Exploration solves the Multi-armed Bandit Problem by exploring any arm that does not have the highest estimated value based on previous plays.

The trade-off between exploitation and exploration is also faced in reinforcement learning.

Application

Real Time Bidding of online advertisement
Website/app optimization problem
Other Markov Desion Process Problem

#Algorithm ##Epsilon-Greedy ##SoftMax ##Upper Confidence Bound ##Thompson Sampling

#Simulate Using MCMC and Bernoulli Bandit to test the different performance between four algorithms

class BernoulliBandit(val probabilityArms:Array[Double]) {
  def draw(i:Int): Double ={
    require( i < probabilityArms.length )
    if(Random.nextDouble() > probabilityArms(i))
      0.0
    else
      1.0
  }
}

trait BanditAlgorithm {

  def selectArm():Int
  def update(chosenArm:Int,reward:Double)
  def initialize(n_arms:Int)

}

#Experiment We can compare the performance of accuracy, average reward and cumulative reward.

Following, I only show the chart on the performance of cumulative rewards. #Implementation

Build it yourself
This project can be built with sbt 0.13. And the breeze package is used for implementing beta distribution.
For SBT, Add these lines to your SBT project definition:

libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "0.11.2"
)

#Reference Wiki of MultiArmedBandit

Bandit Algorithms for Website Optimization by John Myles White

Analysis of Thompson Sampling for the Multi-armed Bandit Problem by Shipra and Navin

An Information-Theoretic Analysis of Thompson Sampling by Daniel And Benjamin #To Do

Contextual Bandit
Stochastic Bandit
Implementing on spark to support parallel and online learning

ReactiveCJ/MultiArmedBandit

MultiArmedBandit

Definition

Application