Q-Learning-Taxi-v2

Goal: Obtain a Q-Learning table with the optimal values for the Taxi-v2 problem.
Approach: Since the goal is to obtain the optimal values for the Q-table, I have implemented an exploration-only strategy, namely random exploration.
Correlated Problem: If the goal was to learn the optimal policy as quick as possible there would be the need to implement a exploration-exploitation strategy, such as epsilon-greedy algorithm [1].

Open AI Gym: Gym is a toolkit for developing and comparing reinforcement learning algorithms. Implementation of Q-Learning to learn optimal q-table of "Taxi-v2 Open AI Gym" and use it to solve the environment.
Taxi Environment: Taxi-v2 is a task introduced by Dietterich [2] to illustrate some issues in hierarchical reinforcement learning. There are 4 locations (labeled by different letters) and your job is to pick up the passenger at one location and drop him off in another. You receive +20 points for a successful dropoff, and lose 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions. The figure below illustrates the environment:

[1] Sutton, R. S. & Barto, A. G. Reinforcement learning: an introduction, 1988.

[2] T Erez, Y Tassa, E Todorov Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition, 2011.

felipelodur/Q-Learning-Taxi-v2