This the final project of University of Washington course IND E 599 Data Driven Optimization
Dynamic pricing with limited supply is a typical bandits with knapsacks (BwK) problem, which has an increasing popularity in areas like machine learning and operation research since recent years. In this course project, a basic version of dynamic pricing with two products under single global constrain was studied. Traditional multi-armed bandit algorithms, classical BwK algorithm, and reinforcement learning algorithms are compared with each other in exploring optimal policies and solving the problem.