
My solutions to the Practical Reinforcement Learning course by Coursera/HSE.

Primary LanguageJupyter NotebookMIT LicenseMIT



My solutions to the Practical Reinforcement Learning course by Coursera and the Higher School of Economics by the National Research University, which is part 4 out of 7 by the Advanced Machine Learning Specialization.

Assignments can be found inside each week's folders and they're displayed in commented Jupyter notebooks along with quizzes.


  • Week 1:
    • OpenAI Gym Environment
    • MDPs
    • Cross-entropy method
    • Evolution strategies
  • Week 2:
    • Rewards
    • Bellman equations
    • Policy/Value iteration
  • Week 3:
    • Model-free learning
    • Q-learning
    • Exploration vs exploitation
    • Monte-Carlo vs Temporal Difference
    • on-policy vs off-policy
    • Experience Replay
  • Week 4:
    • Approximate Value Based Methods
    • Loss functions
    • CartPole
    • Deep Q-Learning (DQN)
  • Week 5:
    • Policy-based methods
    • Policy-based RL vs Value-based RL
    • Actor-critic
    • A3C
  • Week 6:
    • Exploration
    • Measuring
    • Uncertainty-based exploration
    • Bandits
    • Monte Carlo Tree Search
    • seq2seq