/reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

Primary LanguageJupyter NotebookMIT LicenseMIT

Overview

This repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from

Each folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.

All code is written in Python 3 and uses RL environments from OpenAI Gym. Advanced techniques use Tensorflow for neural network implementations.

Table of Contents

List of Implemented Algorithms

  • [Dynamic Programming Policy Evaluation](DP/Policy Evaluation Solution.ipynb)
  • [Dynamic Programming Policy Iteration](DP/Policy Iteration Solution.ipynb)
  • [Dynamic Programming Value Iteration](DP/Value Iteration Solution.ipynb)
  • [Monte Carlo Prediction](MC/MC Prediction Solution.ipynb)
  • [Monte Carlo Control with Epsilon-Greedy Policies](MC/MC Control with Epsilon-Greedy Policies Solution.ipynb)
  • [Monte Carlo Off-Policy Control with Importance Sampling](MC/Off-Policy MC Control with Weighted Importance Sampling Solution.ipynb)
  • [SARSA (On Policy TD Learning)](TD/SARSA Solution.ipynb)
  • [Q-Learning (Off Policy TD Learning)](TD/Q-Learning Solution.ipynb)
  • [Q-Learning with Linear Function Approximation](FA/Q-Learning with Value Function Approximation Solution.ipynb)
  • [Deep Q-Learning for Atari Games](DQN/Deep Q Learning Solution.ipynb)
  • [Double Deep-Q Learning for Atari Games](DQN/Double DQN Solution.ipynb)
  • Deep Q-Learning with Prioritized Experience Replay (WIP)
  • [Policy Gradient: REINFORCE with Baseline](PolicyGradient/CliffWalk REINFORCE with Baseline Solution.ipynb)
  • [Policy Gradient: Actor Critic with Baseline](PolicyGradient/CliffWalk Actor Critic Solution.ipynb)
  • [Policy Gradient: Actor Critic with Baseline for Continuous Action Spaces](PolicyGradient/Continuous MountainCar Actor Critic Solution.ipynb)
  • Deterministic Policy Gradients for Continuous Action Spaces (WIP)
  • Deep Deterministic Policy Gradients (DDPG) (WIP)
  • Asynchronous Advantage Actor Critic (A3C)

Resources

Textbooks:

Classes:

Talks/Tutorials:

Other Projects:

Selected Papers: