OTUS Machine Learning Advanced
Goals:
- Decribe the environment ☑︎
- Apply the Q-Learning algorithm to find optimal policy ☑︎
- Evaluate the optimal policy ☑︎
- Visualize it ☑︎
Means:
- All meaningful programming will be done in gym.
Implementation
- Jupyter notebook to be run locally, as heavy animation makes colab implementation too slow.
Notes
- Training time on 10000 episodes takes about an hour.
- In order to go straight to results of Q-learning you may skip the training phase entirely and
go straight to experiments section that starts with "[Optionally] load Q_states". - All cells preceding "Q-learning algo" still need to be executed.