Python implementation of multi-armed bandit using epsilon-greedy exploration and reward-average sampling estimation
fanta-mnix/vw-bandit
Python implementation of multi-armed bandit using epsilon-greedy exploration and reward-average sampling estimation
Jupyter NotebookMIT