Nth-iteration-labs/streamingbandit

defaults: add e-greedy

Closed this issue · 2 comments

We need an example default for e-greedy. Only get action differs from e-first:

getcontext code: {} # Empty

getaction code:
e = .1
if(binomial(p=e) == 1:

  • pr(.5) -> A and pr(.5) -> B iff t < n=100

else:

  • pr(1) -> max(mean(R_a), mean(R_b)) otherwise

(Note, the action JSON object should include "propensity" = Pr(action). This can be computed using (1-e)*pr() in the above example code).

getreward code:
R_a ~ normal(0, 1)
R_b ~ normal(1,1)

setreward code:
update the appropriate mean reward for the action.

The propensities are e*0.5 and (1-e) (instead of (1-e)*pr(), since that greedy step you will always take the max), right? @MKaptein

Correct