Solving n-armed-bandit problems using different policies to find the path with the least regret. Some of the policies used in this project were policy gradient and Thompson sampling. All the environments and agents are implemented with the aid of the Amalearn library. This project was carried out as part of the Reinforcement graduate course offered at University of Tehran under the supervision of Prof Nili.
You can find all the information about each part of the project in results section.
- Packet-routing
The task of finding the best route to transfer a packet through a congested network
- Thompson-sampling-greedy-policies
Comparing Thompson sampling and greedy policies on a 10-armed bandit task
- Waiting-monetary-value-prospect-theory
Investigating the monetary value of waiting time incorporating the Prospect theory by Daniel Kahneman