- Python v3.6.5
- anaconda
- Run
conda create --name a4
- Run
source activate a4
- Run
conda install --yes --file requirements.txt
- Run
pip install gym pyglet
In this experiment, convergence and performance of value iteration and policy iteration are compared for 3 different MDPs, including:
FrozenLake-v0
FrozenLake8x8-v0
Taxi-v2
Reproduce the results by running python analysis/mdp.py
A Q-learner reinforcement learning algorithm was applied to the "Toy Text" environments. You can reproduce the results by running:
python frozen_lake/q_learning.py
python taxi/q_learning.py