60_Days_RL_Challenge: A Jupyter Notebook repository from icemansina

I designed this Challenge for you and me: Learn Deep Reinforcement Learning in Depth in 60 days!!

You heard about the amazing results achieved by Deepmind with AlphaGo Zero and by OpenAI in Dota 2! Don't you want to know how they work? This is the right opportunity for you and me to finally learn Deep RL and use it on new exciting projects.

The ultimate aim is to use these general-purpose technologies and apply them to all sorts of important real world problems. Demis Hassabis

This repository wants to guide you through the Deep Reinforcement Learning algorithms, from the most basic ones to the highly advanced AlphaGo Zero. You will find the main topics organized by week and the resources suggested to learn them. Also, every week I will provide practical examples implemented in python to help you better digest the theory. You are highly encouraged to modify and play with them!

Stay tuned

#60DaysRLChallenge

Now we have also a Slack channel. To get an invitation, email me at andrea.lonza@gmail.com

This is my first project of this kind, so please, if you have any idea, suggestion or improvement contact me at andrea.lonza@gmail.com.

To learn Deep Learning, Computer Vision or Natural Language Processing check my 1-Year-ML-Journey

Prerequisites

Basic level of Python and PyTorch
Machine Learning
Basic knowledge in Deep Learning (MLP, CNN and RNN)

Index

Week 1 - Introduction

An introduction to Reinforcement Learning by Arxiv Insights
Introduction and course overview - CS294 by Levine
Deep Reinforcement Learning: Pong from Pixels by Karpathy

Suggested

Great introductory paper: Deep Reinforcement Learning: An Overview
Start coding: From Scratch: AI Balancing Act in 50 Lines of Python

Week 2 - RL Basics: MDP, Dynamic Programming and Model-Free Control

Those who cannot remember the past are condemned to repeat it - George Santayana

This week, we will learn about the basic blocks of reinforcement learning, starting from the definition of the problem all the way through the estimation and optimization of the functions that are used to express the quality of a policy or state.

Theoretical material

Markov Decision Process RL by David Silver
- Markov Processes
- Markov Decision Processes

Planning by Dynamic Programming RL by David Silver
- Policy iteration
- Value iteration

Model-Free Prediction RL by David Silver
- Monte Carlo Learning
- Temporal Difference Learning
- TD(λ)

Model-Free Control RL by David Silver
- Ɛ-greedy policy iteration
- GLIE Monte Carlo Search
- SARSA
- Importance Sampling

Project of the Week

Q-learning applied to FrozenLake. For exercise, you can solve the game using SARSA or implement Q-learning by yourself. In the former case, only few changes are needed.

To know more

📚 Read chapters 3,4,5,6,7 of Reinforcement Learning An Introduction - Sutton, Barto
📺 Value functions introduction - DRL UC Berkley by Sergey Levine

Week 3 - Value Function Approximation and DQN

This week we'll learn more advanced concepts and apply deep neural network to Q-learning algorithms.

Theoretical material

Lectures

Value functions approximation - RL by David Silver
- Differentiable function approximators
- Incremental methods
- Batch methods (DQN)

Advanced Q-learning algorithms - DRL UC Berkley by Sergey Levine
- Replay Buffer
- Double Q-learning
- Continous actions (NAF,DDPG)
- Pratical tips

Papers

Must Read

Extensions of DQN

Deep Reinforcement Learning with Double Q-learning - 2015
Prioritized Experience Replay - 2015
Dueling Network Architectures for Deep Reinforcement Learning - 2016
Noisy networks for exploration - 2017
Distributional Reinforcement Learning with Quantile Regression - 2017

Project of the Week

DQN and some variants applied to Pong

This week the goal is to develop a DQN algorithm to play an Atari game. To make it more interesting I developed three extensions of DQN: Double Q-learning, Multi-step learning, Dueling networks and Noisy Nets. Play with them, and if you feel confident, you can implement Prioritized replay, Dueling networks or Distributional RL. To know more about these improvements read the papers!

Suggested

📺 Deep Reinforcement Learning in the Enterprise: Bridging the Gap from Games to Industry

Week 4 - Policy gradient methods and A2C

Week 4 introduce Policy Gradient methods, a class of algorithms that optimize directly the policy. Also, you'll learn about Actor-Critic algorithms. These algorithms combine both policy gradient (the actor) and value function (the critic).

Theoretical material

Lectures

Policy gradient Methods - RL by David Silver
- Finite Difference Policy Gradient
- Monte-Carlo Policy Gradient
- Actor-Critic Policy Gradient

Policy gradient intro - CS294-112 by Sergey Levine (RECAP, optional)
- Policy Gradient (REINFORCE and Vanilla PG)
- Variance reduction

Actor-Critic - CS294-112 by Sergey Levine (More in depth)
- Actor-Critic
- Discout factor
- Actor-Critic algorithm design (batch mode or online)
- state-dependent baseline

Papers

Project of the Week

Vanilla PG and A2C The exercise of this week is to implement a policy gradient method or a more sophisticated actor-critic. In the repository you can find an implemented version of PG and A2C. Pay attention that A2C give me strange result. You can try to make it works or implement an asynchronous version of A2C (A3C).

Suggested

Week 5 - Advanced Policy Gradients - TRPO & PPO

This week is about advanced policy gradient methods that improve the stability and the convergence of the "Vanilla" policy gradient methods. You'll learn and implement PPO, a RL algorithm developed by OpenAI and adopted in OpenAI Five.

Theoretical material

Lectures

Advanced policy gradients - CS294-112 by Sergey Levine
- Problems with "Vanilla" Policy Gradient Methods
- Policy Performance Bounds
- Monotonic Improvement Theory
- Algorithms: NPO, TRPO, PPO

Natural Policy Gradients, TRPO, PPO - John Schulman, Berkey DRL Bootcamp - (RECAP, optional)
- Limitations of "Vanilla" Policy Gradient Methods
- Natural Policy Gradient
- Trust Region Policy Optimization, TRPO
- Proximal Policy Optimization, PPO

Papers

Trust Region Policy Optimization - 2015
Proximal Policy Optimization Algorithms - 2017

Project of the Week

This week, you have to implement PPO or TRPO. I suggest PPO given its simplicity (compared to TRPO). In the project folder Week5 you can find an implementation of PPO that learn to play BipedalWalker. Furthermore, in the folder you can find other resources that will help you in the development of the project. Have fun!

To learn more about PPO read the paper and take a look at the Arxiv Insights's video

NB: the hyperparameters of the PPO implementation I released, can be tuned to improve the convergence.

Suggested

📚 To better understand PPO and TRPO: The Pursuit of (Robotic) Happiness
📺 Nuts and Bolts of Deep RL
📚 PPO best practice: Training with Proximal Policy Optimization
📺 Explanation of the PPO algorithm by Arxiv Insights

Week 6 - Evolution Strategies and Genetic Algorithms

In the last year, Evolution strategies (ES) and Genetic Algorithms (GA) has been shown to achieve comparable results to RL methods. They are derivate-free black-box algorithms that require more data than RL to learn but are able to scale up across thousands of CPUs. This week we'll look at this black-box algorithms.

Material

Evolution Strategies
Genetic Algorithms
- Introduction to Genetic Algorithms — Including Example Code

Papers

Project of the Week

The project is to implement a ES or GA. In the Week6 repository you can find a basic implementation of the paper Evolution Strategies as a Scalable Alternative to Reinforcement Learning to solve LunarLanderContinuous. You can modify it to play more difficult environments or add your ideas.

Week 7 - Model-Based reinforcement learning

The algorithms studied up to now are model-free, meaning that they only choose the better action given a state. These algorithms achieve very good performance but require a lot of training data. Instead, model-based algorithms, learn the environment and plan the next actions accordingly to the model learned. These methods are more sample efficient than model-free but overall achieve worst performance. In this week you'll learn the theory behind these methods and implement one of the last algorithms.

Material

Model-Based RL by Davide Silver (Deepmind) (concise version)
- Integrating Learning and Planning
  - Model-Based RL Overview
  - Integrated architectures
  - Simulation-Based search
Model-Based RL by Sergey Levine (Berkley) (in depth version)
- Learning dynamical systems from data
  - Overview of model-based RL
  - Global and local models
  - Learning with local models and trust regions
- Learning policies by imitating optimal controllers
  - Backpropagation into a policy with learned models
  - Guided policy search algorithm
  - Imitating optimal control with DAgger
- Advanced model learning and images
  - Models in latent space
  - Models directly in image space
  - Inverse models

Papers

Project of the Week

As a project, I chose to implement the model-based algorithm described in this paper. You can find my implementation here. NB: Instead of implementing it on Mujoco as in the paper, I used RoboSchool, an open-source simulator for robot, integrated with OpenAI Gym.

Suggested

📚 World Models - Can agents learn inside of their own dreams?

Week 8 - Advanced Concepts and Project Of Your Choice

This last week is about advanced RL concepts and a project of your choice.

Material

Sergey Levine (Berkley)
David Silver (Deepmind)
- Classic Games

The final project

Here you can find some project ideas.

Pommerman (Multiplayer)
AI for Prosthetics Challenge (Challenge)
Word Models (Paper implementation)
Request for research OpenAI (Research)
Retro Contest (Transfer learning)

Suggested

AlphaGo Zero
- Paper
- DeepMind blog post: AlphaGo Zero: Learning from scratch
- Arxiv Insights video: How AlphaGo Zero works - Google DeepMind
OpenAI Five
- OpenAI blog post: OpenAI Five
- Arxiv Insights video: OpenAI Five: Facing Human Pro's in Dota II

Last 4 days - Review + Sharing

Congratulation for completing the 60 Days RL Challenge!! Let me know if you enjoyed it and share it!

See you!

Best resources

📺 Deep Reinforcement Learning - UC Berkeley class by Levine, check here their site.

📺 Reinforcement Learning course - by David Silver, DeepMind. Great introductory lectures by Silver, a lead researcher on AlphaGo. They follow the book Reinforcement Learning by Sutton & Barto.

📓 Reinforcement Learning: An Introduction - by Sutton & Barto. The "Bible" of reinforcement learning. Here you can find the PDF draft of the second version.

Additional resources

📚 Awesome Reinforcement Learning. A curated list of resources dedicated to reinforcement learning

📚 GroundAI on RL. Papers on reinforcement learning

icemansina/60_Days_RL_Challenge

I designed this Challenge for you and me: Learn Deep Reinforcement Learning in Depth in 60 days!!

#60DaysRLChallenge

Prerequisites

Index

Week 1 - Introduction

Suggested

Week 2 - RL Basics: MDP, Dynamic Programming and Model-Free Control

Theoretical material

Project of the Week

To know more

Week 3 - Value Function Approximation and DQN

Theoretical material

Lectures

Papers

Must Read

Extensions of DQN

Project of the Week

Suggested

Week 4 - Policy gradient methods and A2C

Theoretical material

Lectures

Papers

Project of the Week

Suggested

Week 5 - Advanced Policy Gradients - TRPO & PPO

Theoretical material

Lectures

Papers

Project of the Week

Suggested

Week 6 - Evolution Strategies and Genetic Algorithms

Material

Papers

Project of the Week

Week 7 - Model-Based reinforcement learning

Material

Papers

Project of the Week

Suggested

Week 8 - Advanced Concepts and Project Of Your Choice

Material

The final project

Suggested

Last 4 days - Review + Sharing

Best resources

Additional resources