defaults: add e-greedy
Closed this issue · 2 comments
MKaptein commented
We need an example default for e-greedy. Only get action differs from e-first:
getcontext code: {} # Empty
getaction code:
e = .1
if(binomial(p=e) == 1:
- pr(.5) -> A and pr(.5) -> B iff t < n=100
else:
- pr(1) -> max(mean(R_a), mean(R_b)) otherwise
(Note, the action JSON object should include "propensity" = Pr(action). This can be computed using (1-e)*pr() in the above example code).
getreward code:
R_a ~ normal(0, 1)
R_b ~ normal(1,1)
setreward code:
update the appropriate mean reward for the action.
g0ulash commented
The propensities are e*0.5 and (1-e) (instead of (1-e)*pr(), since that greedy step you will always take the max), right? @MKaptein
MKaptein commented
Correct