You a given a slot machine with multiple arms - each of them will return different rewards. You only have a fixed budget of $100, how do you maximize your rewards in the shortest time possible?
In short, multi-armed bandit:
- is part of probability theory
- is a solution for exploit-explore conundrum
- is a type of reinforcement learning
- maximize rewards in the fastest way