ludobouan/Q-learning-gridworld

Reinforcement learning on gridworld with Q-learning

Jupyter Notebook

Q-learning-gridworld

Reinforcement learning on gridworld with Q-learning
Submission to Siraj Raval's Q-learning competition

Improvements over orignal code

Made the code compatible with Python 3
Changed the main loop to a more traditional episode - step structure
Added Eligibilty traces with both TD-lambda and Walkin's algorithm for greedy and epsilon-greedy policies respectfully.
Changed the bot's policy to epsilon-greedy
Logged the episode data to csv file in order to be analysed later in a jupyter notebook with matplotlib, pandas, and seaborn

Comparison

Original (greedy)	Greedy with eligibility traces	Epsilon-Greedy with eligibility traces
Greedy policy, Q values are initialized to 0.1 to induce exploration	Same greedy policy but uses eligibility traces to make learning considerably faster	Uses epsilon-greedy policy and eligibility traces, turns out to be less effective than the greedy policy with traces but that may be due to my non-optimized hyperparemeters
40 episodes to solution	10 episodes to solution	15 episodes to solution
Sub-optimal solution	Sub-optimal solution	Will converge to optimal solution with right hyperparameters

Usage

Run python Learner.py in terminal

Dependencies

Tkinter
Matplotlib
Seaborn

Custom gridworld level

Credits

Siraj Ravel
PhillipeMorere