This project implements various reinforcement learning algorithms, including state-of-art approaches DDPG and D4PG (incomplete), and use them to solve a challenging balance task. Moreover, a systematic comparison among implemented algorithms is given. MSc dissertation is available here.
- Q Learning
- Policy Gradient
- Actor-critic
- DQN (Deep Q-network)
- Prioritized DQN (DQN + Prioritized experience replay)
- Categorical DQN (DQN + Distributional perspective)
- DDPG (Deep Deterministic Policy Gradient)
- Incomplete D4PG (DDPG + Prioritized experience replay)
The first deep reinforcement learning algorithm, deep Q-network (DQN), was proposed by DeepMind in 2015. It is based on a basic reinforcement learning algorithm Q learning. Since then, reinforcement learning can be empowered by deep neural network directly. One of the most significant achievements is that reinforcement learning is capable of dealing with high-dimensional states like raw image pixels. After that, the ideas underlying the success of deep Q-network was adapted to the continuous action space, then we have the deep deterministic policy gradient (DDPG) algorithm which is presented in 2017. It relies on actor-critic architecture. Furthermore, distributional distributed deep deterministic policy gradient (D4PG) algorithm, an extended version of DDPG, was proposed in 2018. It adopts several very successful improvements (e.g. distributional perspective) and works within a distributed framework. Below is a flowchart showing the relationship of thoes algorithms.
It is a balance task for a 2-wheel robot. The robot needs to go down a slope, climb another slope and keep the body balanced. It is a torque control robot with two actuated joints on two wheels. The simulation is powered by pybullet. You can watch the full video here.
Test and compare DQN, Prioritized DQN and Categorical DQN on OpenAI Gym CartPole-v1 task.
Test and compare DQN, DDPG, Incomplete D4PG on the balancing task.
This project implements state-of-the-art deep reinforcement algorithms DDPG and D4PG (incomplete) from scratch (i.e. implementation starts from basic RL algorithm e.g. Q learning). Solve the challenging balance task with a novel approach (i.e. deep Reinforcement Learning) instead of conventional control theory like PID control. Moreover, we compare five different deep reinforcement learning algorithms and proves that D4PG is the best one among them.
Each python file implements one reinforcement learning algorithm on either Balancing task (i.e. the designed task) or CartPole task. There are 10 python files totally including 8 RL algorithms. All implementations are from scratch.
- Ubuntu 14.04, 16.04
- Python 3.6, 3.5
- PyTorch v0.4.1
- OpenAI Gym (Classic control)
- PyBullet 2.1
- Playing Atari with Deep Reinforcement Learning
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
- Deterministic Policy Gradient Algorithms
- CONTINUOUS CONTROL WITH DEEP REINFORCEMENTLEARNING
- DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY
- DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS
- PyBullet Quickstart Guide
- DeepMind Control Suite
- Deep Reinforcement Learning Hands-On
- Morvan tutorials