Observation Space | Reward | NN structure | Method-Hyper params | state | Results |
---|---|---|---|---|---|
Default | Time-step dependent | Default | Default | (?) | (?) |
GLM_paper + foot normal force + prev_action | Iteration dependent velocity (sampling around zero -> sampling around 2) | Default | Default | Done | Better learning curve (eps_length = 980 at around 500 thousand iters) |
GLM_paper + foot normal force + prev_action | Iteration dependent velocity (sampling around one with zero variation -> sampling around 1 with 1 variation) | Default | Default | Underconstruction | The learning curve seems to be worse than the previous one, but the motion seems (visually) better. Needs more test with fixed speed |
Default | Default | Default | andom Initialization of the robot state | (?) | (?) |
Default | Foot slipage | Default | Default | work in progress (by alex :)) | (?) |
Default | air-time | Default | Default | Done | Converged quickly also (around 400 000 steps), difficult to say improvement, but locomotion looks good |
Default | Smooth action (difference between actions in time-steps) | Default | Default | (?) | (?) |
-
Observation space
- Body: {orientation, velocity, angular velocity}
- Joints: {position, velocity}
- Robot height (z)
- CPG feedback: {$\theta$,
$r$ ,$\phi$ , $\dot{\phi}$} - History: {last_action}
- Distance to goal
- Heading to goal
-
Reward
- Distance increment to goal:
$10 * ||d_{last} - d_{current}||^2$ - Direction to goal:
$-0.01 * |\theta_{goal}|$ - Survival reward:
$0.01t$ - velocity in z:
$-||v_z||^2$ - Roll/pitch:
$-0.005*||[\omega_{\theta},\omega_{\phi}]||^2$ - Smooth action:
$-0.05*||a_t - a_{t-1}||^2$ - Energy: $-0.001energydt$
- Distance increment to goal:
-
Curriculum learning
- New coefficient
$k_l = min(1,e^\frac{T-500000}{50000})$ -
$k_l$ multiplies following reward to make them 0 in the beginning: velocity z, roll/pitch, direction, last action -
$(1-k_l)$ multiplies this survival to make it near zero after 500000 timesteps
- New coefficient