A small demonstration of three simple algorithms for the multi-arm bandit problem
Requires:
- Python 3
- Pandas
- Numpy
Run:
python3 demo.py
Outputs:
# Strategies
explore 7.682451493483219
exploit 8.968238407784087
greedy 9.488354874608259
optimal 10.052427913089582
There exists different algorithms to solve the multi-arm bandit problem.
Here are some videos that introduce the topic. I have based this demo on the first one.