/gdrl

Grokking Deep Reinforcement Learning

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Grokking Deep Reinforcement Learning

Note: At the moment, only running the code from the docker container (below) is supported. Docker allows for creating a single environment that is more likely to work on all systems. Basically, I install and configure all packages for you, except docker itself, and you just run the code on a tested environment.

To install docker, I recommend a web search for "installing docker on <your os here>". For running the code on a GPU, you have to additionally install nvidia-docker. NVIDIA Docker allows for using a host's GPUs inside docker containers. After you have docker (and nvidia-docker if using a GPU) installed, follow the three steps below.

Running the code

  1. Clone this repo:
    git clone --depth 1 https://github.com/mimoralea/gdrl.git && cd gdrl
  2. Pull the gdrl image with:
    docker pull mimoralea/gdrl:v0.14
  3. Spin up a container:
    • On Mac or Linux:
      docker run -it --rm -p 8888:8888 -v "$PWD"/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14
    • On Windows:
      docker run -it --rm -p 8888:8888 -v %cd%/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14
    • NOTE: Use nvidia-docker if you are using a GPU.
  4. Open a browser and go to the URL shown in the terminal (likely to be: http://localhost:8888). The password is: gdrl

About the book

Book's website

https://www.manning.com/books/grokking-deep-reinforcement-learning

Table of content

  1. Introduction to deep reinforcement learning
  2. Mathematical foundations of reinforcement learning
  3. Balancing immediate and long-term goals
  4. Balancing the gathering and utilization of information
  5. Evaluating agents' behaviors
  6. Improving agents' behaviors
  7. Achieving goals more effectively and efficiently
  8. Introduction to value-based deep reinforcement learning
  9. More stable value-based methods
  10. Sample-efficient value-based methods
  11. Policy-gradient and actor-critic methods
  12. Advanced actor-critic methods
  13. Towards artificial general intelligence

Detailed table of content

1. Introduction to deep reinforcement learning

2. Mathematical foundations of reinforcement learning

  • (Livebook)
  • (Notebook)
    • Implementations of several MDPs:
      • Bandit Walk
      • Bandit Slippery Walk
      • Slippery Walk Three
      • Random Walk
      • Russell and Norvig's Gridworld from AIMA
      • FrozenLake
      • FrozenLake8x8

3. Balancing immediate and long-term goals

  • (Livebook)
  • (Notebook)
    • Implementations of methods for finding optimal policies:
      • Policy Evaluation
      • Policy Improvement
      • Policy Iteration
      • Value Iteration

4. Balancing the gathering and utilization of information

  • (Livebook)
  • (Notebook)
    • Implementations of exploration strategies for bandit problems:
      • Random
      • Greedy
      • E-greedy
      • E-greedy with linearly decaying epsilon
      • E-greedy with exponentially decaying epsilon
      • Optimistic initialization
      • SoftMax
      • Upper Confidence Bound
      • Bayesian

5. Evaluating agents' behaviors

  • (Livebook)
  • (Notebook)
    • Implementation of algorithms that solve the prediction problem (policy estimation):
      • On-policy first-visit Monte-Carlo prediction
      • On-policy every-visit Monte-Carlo prediction
      • Temporal-Difference prediction (TD)
      • n-step Temporal-Difference prediction (n-step TD)
      • TD(λ)

6. Improving agents' behaviors

  • (Livebook)
  • (Notebook)
    • Implementation of algorithms that solve the control problem (policy improvement):
      • On-policy first-visit Monte-Carlo control
      • On-policy every-visit Monte-Carlo control
      • On-policy TD control: SARSA
      • Off-policy TD control: Q-Learning
      • Double Q-Learning

7. Achieving goals more effectively and efficiently

  • (Livebook)
  • (Notebook)
    • Implementation of more effective and efficient reinforcement learning algorithms:
      • SARSA(λ) with replacing traces
      • SARSA(λ) with accumulating traces
      • Q(λ) with replacing traces
      • Q(λ) with accumulating traces
      • Dyna-Q
      • Trajectory Sampling

8. Introduction to value-based deep reinforcement learning

  • (Livebook)
  • (Notebook)
    • Implementation of a value-based deep reinforcement learning baseline:
      • Neural Fitted Q-iteration (NFQ)

9. More stable value-based methods

  • (Livebook)
  • (Notebook)
    • Implementation of "classic" value-based deep reinforcement learning methods:
      • Deep Q-Networks (DQN)
      • Double Deep Q-Networks (DDQN)

10. Sample-efficient value-based methods

  • (Livebook)
  • (Notebook)
    • Implementation of main improvements for value-based deep reinforcement learning methods:
      • Dueling Deep Q-Networks (Dueling DQN)
      • Prioritized Experience Replay (PER)

11. Policy-gradient and actor-critic methods

  • (Livebook)
  • (Notebook)
    • Implementation of classic policy-based and actor-critic deep reinforcement learning methods:
      • Policy Gradients without value function and Monte-Carlo returns (REINFORCE)
      • Policy Gradients with value function baseline trained with Monte-Carlo returns (VPG)
      • Asynchronous Advantage Actor-Critic (A3C)
      • Generalized Advantage Estimation (GAE)
      • [Synchronous] Advantage Actor-Critic (A2C)

12. Advanced actor-critic methods

  • (Livebook)
  • (Notebook)
    • Implementation of advanced actor-critic methods:
      • Deep Deterministic Policy Gradient (DDPG)
      • Twin Delayed Deep Deterministic Policy Gradient (TD3)
      • Soft Actor-Critic (SAC)
      • Proximal Policy Optimization (PPO)

13. Towards artificial general intelligence