Humanoid walking simulation using kinect and reinforcement learning

Objective:

To compare the simulation of humanoid walking through Deep Deterministic Policy Gradient (DDPG), Naturalized Advantage Function(NAF) and Kinect.

Problem Type:

Reinforcement Learning and Leveraging Vision through ** RGB-D** Sensor

WALKING SIMULATION

Method - 1 - Deep Deterministic Policy Gradient (DDPG)

Case 1

Actor - Activation Function - Linear
Critic - Activation Function - Sigmoid

Output

Case 2

Actor - Activation Function - Linear
Critic - Activation Function - Tanh

Output

import io
import base64
from IPython.display import HTML

video = io.open('model_linear_tanh.mp4', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

Case 3

Actor - Activation Function - Sigmoid
Critic - Activation Function - Tanh

Output

import io
import base64
from IPython.display import HTML

video = io.open('model_sigmoid_tanh.mp4', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))