This project explores the implementation and comparison of two popular multi-armed bandit algorithms: Epsilon Greedy and Thompson Sampling. Multi-armed bandit problems involve a trade-off between exploration (trying different arms) and exploitation (choosing the best-known arm). These algorithms are commonly used in scenarios like online advertising, recommendation systems, and more.
-
Create a Bandit Class: Implement a class to represent the bandit arms, which are used in both Epsilon Greedy and Thompson Sampling algorithms.
-
Create EpsilonGreedy() and ThompsonSampling() Classes and Methods: Implement classes and methods for the Epsilon Greedy and Thompson Sampling algorithms, both of which are inherited from the Bandit class.
-
Epsilon Greedy:
- Implement an Epsilon Greedy class that decays epsilon by 1/t.
- Design the experiment using the Epsilon Greedy algorithm.
-
Thompson Sampling:
- Implement a Thompson Sampling class with known precision.
- Design the experiment using the Thompson Sampling algorithm.
-
-
Report:
- Visualize the Learning Process: Implement a method
plot1()
to visualize the learning process for each algorithm. - Visualize Cumulative Rewards: Compare and visualize cumulative rewards from Epsilon Greedy and Thompson Sampling.
- Store the Rewards in a CSV File: Create CSV files that store experiment data, including Bandit, Reward, and Algorithm information.
- Print Cumulative Reward: Display the cumulative reward achieved by the algorithms.
- Print Cumulative Regret: Display the cumulative regret of each the algorithms.
- Visualize the Learning Process: Implement a method
bandit.py
: Defines the abstract class for bandit arms, Epsilon Greedy and Thompson Sampling algorithm implementations and functions for visualizing experiment results.Report.ipynb
: A report of experiment results.