/Reinforcement-Learning

MSc Project (2018.04 - 2018.09)

Primary LanguagePython

Reinforcement-Learning in robot's locomotion control

This project implements various reinforcement learning algorithms, including state-of-art approaches DDPG and D4PG (incomplete), and use them to solve a challenging balance task. Moreover, a systematic comparison among implemented algorithms is given. MSc dissertation is available here.

Discription of implemented algorithms

  • Q Learning
  • Policy Gradient
  • Actor-critic
  • DQN (Deep Q-network)
  • Prioritized DQN (DQN + Prioritized experience replay)
  • Categorical DQN (DQN + Distributional perspective)
  • DDPG (Deep Deterministic Policy Gradient)
  • Incomplete D4PG (DDPG + Prioritized experience replay)

The first deep reinforcement learning algorithm, deep Q-network (DQN), was proposed by DeepMind in 2015. It is based on a basic reinforcement learning algorithm Q learning. Since then, reinforcement learning can be empowered by deep neural network directly. One of the most significant achievements is that reinforcement learning is capable of dealing with high-dimensional states like raw image pixels. After that, the ideas underlying the success of deep Q-network was adapted to the continuous action space, then we have the deep deterministic policy gradient (DDPG) algorithm which is presented in 2017. It relies on actor-critic architecture. Furthermore, distributional distributed deep deterministic policy gradient (D4PG) algorithm, an extended version of DDPG, was proposed in 2018. It adopts several very successful improvements (e.g. distributional perspective) and works within a distributed framework. Below is a flowchart showing the relationship of thoes algorithms.

alt text

The Balancing task (Designed task)

It is a balance task for a 2-wheel robot. The robot needs to go down a slope, climb another slope and keep the body balanced. It is a torque control robot with two actuated joints on two wheels. The simulation is powered by pybullet. You can watch the full video here.

alt text

Results

Test and compare DQN, Prioritized DQN and Categorical DQN on OpenAI Gym CartPole-v1 task.

alt text

Test and compare DQN, DDPG, Incomplete D4PG on the balancing task.

alt text

Conclusion

This project implements state-of-the-art deep reinforcement algorithms DDPG and D4PG (incomplete) from scratch (i.e. implementation starts from basic RL algorithm e.g. Q learning). Solve the challenging balance task with a novel approach (i.e. deep Reinforcement Learning) instead of conventional control theory like PID control. Moreover, we compare five different deep reinforcement learning algorithms and proves that D4PG is the best one among them.

Usage and Dependencies

Each python file implements one reinforcement learning algorithm on either Balancing task (i.e. the designed task) or CartPole task. There are 10 python files totally including 8 RL algorithms. All implementations are from scratch.

  • Ubuntu 14.04, 16.04
  • Python 3.6, 3.5
  • PyTorch v0.4.1
  • OpenAI Gym (Classic control)
  • PyBullet 2.1

Main References