Multi-arm Bandits Exploration

This is an bandit experiment that implements different exploration techniques for a 10-arm testbed as described in the Reinforcement Learning Book by Sutton & Barto.

The exploration techniques covered include:

ε-greedy
Optimistic Initialization
UCB Exploration
Boltzmann (Softmax) Exploration

This experiment further compares the different exploration techniques and concludes on which is better to use in different settings.

ruqoyyasadiq/deep_RL-multi-arm-bandit-exploration

Multi-arm Bandits Exploration