/ddpg_biped

Repository for Planar Bipedal walking robot in Gazebo environment using Deep Deterministic Policy Gradient(DDPG) using TensorFlow.

Primary LanguagePython

Reinforcement Learning for Bipedal walking robot.

This repository contains the simulation architecture based in Gazebo environment for implementing reinforcement learning algorithm, DDPG for generating bipedal walking patterns for the robot.

Planar Bipedal walking robot in Gazebo environment using Deep Deterministic Policy Gradient(DDPG).

The autonomous walking of the bipedal walking robot is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG)1. DDPG utilises the actor-critic learning framework for learning controls in continuous action spaces.

The project details & the results of the experiment have been documented in the research manuscript, Bipedal walking robot using Deep Deterministic Policy Gradient

This project was developed at the Computational Intelligence Laboratory, IISc, Bangalore.

What you need before starting (Dependencies & Packages):

File setup:

  • walker_gazebo contains the robot model(both .stl files & .urdf file) and also the gazebo launch file.

  • walker_controller contains the reinforcement learning implementation of DDPG algorithm for control of the bipedal walking robot.

Learning to walk, initial baby steps

Stable bipedal walking

[Project video]

Note: A stable bipedal walking was acheived after training the model using a Nvidia GeForce GTX 1050 Ti GPU enabled system for over 41 hours. The visualization for the horizontal boom(attached to the waist) is turned off.

Sources:

  1. Lillicrap, Timothy P., et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
  2. Silver, David, et al. Deterministic Policy Gradient Algorithms. ICML (2014).

Project Collaborator(s):

Arun Kumar (arunkumar12@iisc.ac.in) & Dr. S N Omkar (omkar@iisc.ac.in)

Future work

Implement state of the art RL algorithms(TRPO & PPO) for the same. Hopefully lead to faster training and less convergence time.