/OTUS_ADV_HW7

Reinfircement learning: Q-learning algo - lunarlander-v2

Primary LanguageJupyter NotebookMIT LicenseMIT

OTUS Machine Learning Advanced

Homework 7

Reinforcement learning: application of Q-Learning algorithm to Lunar Lander environment from OpenAI

header

Goals:

  • Decribe the environment ☑︎
  • Apply the Q-Learning algorithm to find optimal policy ☑︎
  • Evaluate the optimal policy ☑︎
  • Visualize it ☑︎

Means:

  • All meaningful programming will be done in gym.

Implementation

  • Jupyter notebook to be run locally, as heavy animation makes colab implementation too slow.

Notes

  • Training time on 10000 episodes takes about an hour.
  • In order to go straight to results of Q-learning you may skip the training phase entirely and
    go straight to experiments section that starts with "[Optionally] load Q_states".
  • All cells preceding "Q-learning algo" still need to be executed.

footer_small