Grokking Deep Reinforcement Learning

Note: At the moment, only running the code from the docker container (below) is supported. Docker allows for creating a single environment that is more likely to work on all systems. Basically, I install and configure all packages for you, except docker itself, and you just run the code on a tested environment.

To install docker, I recommend a web search for "installing docker on <your os here>". For running the code on a GPU, you have to additionally install nvidia-docker. NVIDIA Docker allows for using a host's GPUs inside docker containers. After you have docker (and nvidia-docker if using a GPU) installed, follow the three steps below.

Running the code

Clone this repo:
git clone --depth 1 https://github.com/mimoralea/gdrl.git && cd gdrl
Pull the gdrl image with:
docker pull mimoralea/gdrl:v0.14
Spin up a container:
- On Mac or Linux:
  docker run -it --rm -p 8888:8888 -v "$PWD"/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14
- On Windows:
  docker run -it --rm -p 8888:8888 -v %cd%/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14
- NOTE: Use nvidia-docker if you are using a GPU.
Open a browser and go to the URL shown in the terminal (likely to be: http://localhost:8888). The password is: gdrl

About the book

Book's website

https://www.manning.com/books/grokking-deep-reinforcement-learning

Table of content

Introduction to deep reinforcement learning
Mathematical foundations of reinforcement learning
Balancing immediate and long-term goals
Balancing the gathering and utilization of information
Evaluating agents' behaviors
Improving agents' behaviors
Achieving goals more effectively and efficiently
Introduction to value-based deep reinforcement learning
More stable value-based methods
Sample-efficient value-based methods
Policy-gradient and actor-critic methods
Advanced actor-critic methods
Towards artificial general intelligence

Detailed table of content

1. Introduction to deep reinforcement learning

(Livebook)
(No Notebook)

2. Mathematical foundations of reinforcement learning

(Livebook)
(Notebook)
- Implementations of several MDPs:
  - Bandit Walk
  - Bandit Slippery Walk
  - Slippery Walk Three
  - Random Walk
  - Russell and Norvig's Gridworld from AIMA
  - FrozenLake
  - FrozenLake8x8

3. Balancing immediate and long-term goals

(Livebook)
(Notebook)
- Implementations of methods for finding optimal policies:
  - Policy Evaluation
  - Policy Improvement
  - Policy Iteration
  - Value Iteration

4. Balancing the gathering and utilization of information

(Livebook)
(Notebook)
- Implementations of exploration strategies for bandit problems:
  - Random
  - Greedy
  - E-greedy
  - E-greedy with linearly decaying epsilon
  - E-greedy with exponentially decaying epsilon
  - Optimistic initialization
  - SoftMax
  - Upper Confidence Bound
  - Bayesian

5. Evaluating agents' behaviors

(Livebook)
(Notebook)
- Implementation of algorithms that solve the prediction problem (policy estimation):
  - On-policy first-visit Monte-Carlo prediction
  - On-policy every-visit Monte-Carlo prediction
  - Temporal-Difference prediction (TD)
  - n-step Temporal-Difference prediction (n-step TD)
  - TD(λ)

6. Improving agents' behaviors

(Livebook)
(Notebook)
- Implementation of algorithms that solve the control problem (policy improvement):
  - On-policy first-visit Monte-Carlo control
  - On-policy every-visit Monte-Carlo control
  - On-policy TD control: SARSA
  - Off-policy TD control: Q-Learning
  - Double Q-Learning

7. Achieving goals more effectively and efficiently

(Livebook)
(Notebook)
- Implementation of more effective and efficient reinforcement learning algorithms:
  - SARSA(λ) with replacing traces
  - SARSA(λ) with accumulating traces
  - Q(λ) with replacing traces
  - Q(λ) with accumulating traces
  - Dyna-Q
  - Trajectory Sampling

8. Introduction to value-based deep reinforcement learning

(Livebook)
(Notebook)
- Implementation of a value-based deep reinforcement learning baseline:
  - Neural Fitted Q-iteration (NFQ)

9. More stable value-based methods

(Livebook)
(Notebook)
- Implementation of "classic" value-based deep reinforcement learning methods:
  - Deep Q-Networks (DQN)
  - Double Deep Q-Networks (DDQN)

10. Sample-efficient value-based methods

(Livebook)
(Notebook)
- Implementation of main improvements for value-based deep reinforcement learning methods:
  - Dueling Deep Q-Networks (Dueling DQN)
  - Prioritized Experience Replay (PER)

11. Policy-gradient and actor-critic methods

(Livebook)
(Notebook)
- Implementation of classic policy-based and actor-critic deep reinforcement learning methods:
  - Policy Gradients without value function and Monte-Carlo returns (REINFORCE)
  - Policy Gradients with value function baseline trained with Monte-Carlo returns (VPG)
  - Asynchronous Advantage Actor-Critic (A3C)
  - Generalized Advantage Estimation (GAE)
  - [Synchronous] Advantage Actor-Critic (A2C)

12. Advanced actor-critic methods

(Livebook)
(Notebook)
- Implementation of advanced actor-critic methods:
  - Deep Deterministic Policy Gradient (DDPG)
  - Twin Delayed Deep Deterministic Policy Gradient (TD3)
  - Soft Actor-Critic (SAC)
  - Proximal Policy Optimization (PPO)

13. Towards artificial general intelligence

(Livebook)
(No Notebook)

neocsr/gdrl