/RL-KArmed-Bandit

K-armed bandit problem approached with a variety of action-selection learning algorithms.

Primary LanguageC

Reinforcement Learning Practical - Project 1

K-Armed bandit problem

The program is able to run several instances of an algorithm on the K-Armed bandit problem. Through providing arguments, the following parameters for the problem can be set:

  • The value distribution of the arms.
  • The learning algorithm to perform on the problem.
  • The parameter for the algorithm.

And optionally:

  • The number of instances (N runs) - Default: 20000
  • The number of arms (K actions) - Default: 10
  • The number of time steps in a run (T steps) - Default: 1000

A more detailed description of how to run the program with these parameters is described in: Run the program.

Compile the C source code (gcc)

The code can be compiled through:
gcc bandit.c safeAlloc.c -o bandit -O3 -lm

Run the program

The program can be run through:
./bandit <Value distribution> <Algorithm> <Param 1> [N-Runs] [K-Arms] [T-Steps]

The arguments need to be specified following the rules:

Value distribution: The value distribution of the arms. Select either 0 or 1.

  • Gaussian: 0
  • Bernoulli: 1

Algorithm: The learning algorithm to perform on the problem. Select 0, 1, 2 or 3.

  • Espilon Greedy: 0
  • Reinforcement Comparison: 1
  • Pursuit Method: 2
  • Stochastic Gradient Ascent: 3

Param 1: Parameter for the algorithm. Select any float > 0.

  • Epsilon Greedy: Param 1 = Epsilon
  • Reinforcement Comparison: Param 1 = Beta
  • Pursuit Method: Param 1 = Beta
  • Stochastic Gradient Ascent: Param 1 = Alpha

N-Runs: (Optional) The number of instances (N runs). Select any integer N > 0.

K-Arms: (Optional) The number of arms (K actions). Select any integer K > 0.
(Note that this parameter can not be selected without providing N-Runs)

T-Steps: (Optional) The number of time steps in a run (T steps). Select any integer T > 0.
(Note that this parameter can not be selected without providing N-Runs and K-Arms)