reinforcement-learning-1: A Jupyter Notebook repository from HieuFromWaterloo

Overview

This repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from

Each folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.

All code is written in Python 3 and uses RL environments from OpenAI Gym. Advanced techniques use Tensorflow for neural network implementations.

Introduction to RL problems & OpenAI Gym
MDPs and Bellman Equations
Dynamic Programming: Model-Based RL, Policy Iteration and Value Iteration
Monte Carlo Model-Free Prediction & Control
Temporal Difference Model-Free Prediction & Control
Function Approximation
Deep Q Learning (WIP)
Policy Gradient Methods (WIP)
Learning and Planning (WIP)
Exploration and Exploitation (WIP)

List of Implemented Algorithms

[Dynamic Programming Policy Evaluation](DP/Policy Evaluation Solution.ipynb)
[Dynamic Programming Policy Iteration](DP/Policy Iteration Solution.ipynb)
[Dynamic Programming Value Iteration](DP/Value Iteration Solution.ipynb)
[Monte Carlo Prediction](MC/MC Prediction Solution.ipynb)
[Monte Carlo Control with Epsilon-Greedy Policies](MC/MC Control with Epsilon-Greedy Policies Solution.ipynb)
[Monte Carlo Off-Policy Control with Importance Sampling](MC/Off-Policy MC Control with Weighted Importance Sampling Solution.ipynb)
[SARSA (On Policy TD Learning)](TD/SARSA Solution.ipynb)
[Q-Learning (Off Policy TD Learning)](TD/Q-Learning Solution.ipynb)
[Q-Learning with Linear Function Approximation](FA/Q-Learning with Value Function Approximation Solution.ipynb)
[Deep Q-Learning for Atari Games](DQN/Deep Q Learning Solution.ipynb)
[Double Deep-Q Learning for Atari Games](DQN/Double DQN Solution.ipynb)
Deep Q-Learning with Prioritized Experience Replay (WIP)
[Policy Gradient: REINFORCE with Baseline](PolicyGradient/CliffWalk REINFORCE with Baseline Solution.ipynb)
[Policy Gradient: Actor Critic with Baseline](PolicyGradient/CliffWalk Actor Critic Solution.ipynb)
[Policy Gradient: Actor Critic with Baseline for Continuous Action Spaces](PolicyGradient/Continuous MountainCar Actor Critic Solution.ipynb)
Deterministic Policy Gradients for Continuous Action Spaces (WIP)
Deep Deterministic Policy Gradients (DDPG) (WIP)
Asynchronous Advantage Actor Critic (A3C)

Resources

Textbooks:

Reinforcement Learning: An Introduction (2nd Edition)

Classes:

Talks/Tutorials:

Other Projects:

Selected Papers:

HieuFromWaterloo/reinforcement-learning-1

Overview

Table of Contents

List of Implemented Algorithms

Resources