FrozenLake Q-Learner

Implementation of a RL Q-learner using gymnasium's Frozen Lake environment

This was created to play with a RL toy problem, to understand the differences in RL approaches.

Background

The Q-learning algorithm works well for finite states and actions spaces, but we store every state-action pair in a table, which requires a huge amount of memory.

In the case where states space, actions space or both of them are continuous, it is just impossible to use the Q-learning algorithm.

Note that for Frozen Lakes environments of size 8x8, it becomes more similar to a pathfinding problem, for which RL is actually discouraged and traditional approaches such as A* work the similarly or better, while being less complex.

Also, for larger grids, RL suffers from the well known problem of credit assignment. Frozen Lake's reward schedule is:

Reach goal(G): +1
Reach hole(H): 0
Reach frozen(F): 0

Because the reward is 0 on most of the steps, it makes it difficult for the algorithm to figure out what were the good moves that led to the win, and what were the bad ones.

Installation

Recommended: Create a virtual environment and activate it (Python Documentation)

python -m venv env
source env/bin/activate

Install requirements:

pip install -r requirements.txt

Run:

python q-solver.py
python sarsa.py

TODO

Add Online vs. Offline, and other ways for selecting the next action (mean, max, best action)
Look to add DQN, other RL approaches for different types of games
Add text on different architectures (eg. ResNet), Behavior Cloning, Proximal Policy Optimization, etc.

References

Inspired by: TowardsDataScience

Also see: GeeksForGeeks: Q-learning in Python On-policy vs. Off-policy approaches: GeeksForGeeks: SARSA vs. Q-Learning

andli28/FrozenLake-Solver

FrozenLake Q-Learner

Background

Installation

TODO

References