Reinforcement Learning Practical - Project 1

K-Armed bandit problem

The program is able to run several instances of an algorithm on the K-Armed bandit problem. Through providing arguments, the following parameters for the problem can be set:

The value distribution of the arms.
The learning algorithm to perform on the problem.
The parameter for the algorithm.

And optionally:

The number of instances (N runs) - Default: 20000
The number of arms (K actions) - Default: 10
The number of time steps in a run (T steps) - Default: 1000

A more detailed description of how to run the program with these parameters is described in: Run the program.

Compile the C source code (gcc)

The code can be compiled through:
gcc bandit.c safeAlloc.c -o bandit -O3 -lm

Run the program

The program can be run through:
./bandit <Value distribution> <Algorithm> <Param 1> [N-Runs] [K-Arms] [T-Steps]

The arguments need to be specified following the rules:

Value distribution: The value distribution of the arms. Select either 0 or 1.

Gaussian: 0
Bernoulli: 1

Algorithm: The learning algorithm to perform on the problem. Select 0, 1, 2 or 3.

Espilon Greedy: 0
Reinforcement Comparison: 1
Pursuit Method: 2
Stochastic Gradient Ascent: 3

Param 1: Parameter for the algorithm. Select any float > 0.

Epsilon Greedy: Param 1 = Epsilon
Reinforcement Comparison: Param 1 = Beta
Pursuit Method: Param 1 = Beta
Stochastic Gradient Ascent: Param 1 = Alpha

N-Runs: (Optional) The number of instances (N runs). Select any integer N > 0.

K-Arms: (Optional) The number of arms (K actions). Select any integer K > 0.
(Note that this parameter can not be selected without providing N-Runs)

T-Steps: (Optional) The number of time steps in a run (T steps). Select any integer T > 0.
(Note that this parameter can not be selected without providing N-Runs and K-Arms)

Rmko4/RL-KArmed-Bandit

Reinforcement Learning Practical - Project 1

K-Armed bandit problem

Compile the C source code (gcc)

Run the program