The program is able to run several instances of an algorithm on the K-Armed bandit problem. Through providing arguments, the following parameters for the problem can be set:
- The value distribution of the arms.
- The learning algorithm to perform on the problem.
- The parameter for the algorithm.
And optionally:
- The number of instances (N runs) - Default: 20000
- The number of arms (K actions) - Default: 10
- The number of time steps in a run (T steps) - Default: 1000
A more detailed description of how to run the program with these parameters is described in: Run the program.
The code can be compiled through:
gcc bandit.c safeAlloc.c -o bandit -O3 -lm
The program can be run through:
./bandit <Value distribution> <Algorithm> <Param 1> [N-Runs] [K-Arms] [T-Steps]
The arguments need to be specified following the rules:
Value distribution: The value distribution of the arms. Select either 0 or 1.
- Gaussian: 0
- Bernoulli: 1
Algorithm: The learning algorithm to perform on the problem. Select 0, 1, 2 or 3.
- Espilon Greedy: 0
- Reinforcement Comparison: 1
- Pursuit Method: 2
- Stochastic Gradient Ascent: 3
Param 1: Parameter for the algorithm. Select any float > 0.
- Epsilon Greedy: Param 1 = Epsilon
- Reinforcement Comparison: Param 1 = Beta
- Pursuit Method: Param 1 = Beta
- Stochastic Gradient Ascent: Param 1 = Alpha
N-Runs: (Optional) The number of instances (N runs). Select any integer N > 0.
K-Arms: (Optional) The number of arms (K actions). Select any integer K > 0.
(Note that this parameter can not be selected without providing N-Runs)
T-Steps: (Optional) The number of time steps in a run (T steps). Select any integer T > 0.
(Note that this parameter can not be selected without providing N-Runs and K-Arms)