oort77/OTUS_ADV_HW7

Reinfircement learning: Q-learning algo - lunarlander-v2

Jupyter NotebookMIT

OTUS Machine Learning Advanced

Homework 7

Reinforcement learning: application of Q-Learning algorithm to Lunar Lander environment from OpenAI

Goals:

Decribe the environment ☑︎
Apply the Q-Learning algorithm to find optimal policy ☑︎
Evaluate the optimal policy ☑︎
Visualize it ☑︎

Means:

All meaningful programming will be done in gym.

Implementation

Jupyter notebook to be run locally, as heavy animation makes colab implementation too slow.

Notes

Training time on 10000 episodes takes about an hour.
In order to go straight to results of Q-learning you may skip the training phase entirely and
go straight to experiments section that starts with "[Optionally] load Q_states".
All cells preceding "Q-learning algo" still need to be executed.