Bandit Algorithm Tutorial

This explain the basic concept of the regret upper or lower bound for the bandit algorithm and the example of epsilon greedy policy.