ashish1401/MAB-UCB

Code implementation of the Reinforcement Learning algorithm, Upper Confidence Bound, a famous algorithm to counter the Multi Armed Bandit problem. The program returns the number of pulls per "arm" to minimize the regret in accordance with the UCB Algorithm

Jupyter Notebook

##MAB-UCB