🐛 Large tracking error with PPO learned policy

Question

🐛 Large tracking error with PPO learned policy

jc-bao opened this issue a year ago · 4 comments

jc-bao commented a year ago

Performance

30 centimeter tracking error is relatively large.

meshcat_1694284071493.tar.mp4

Next step

Slow down the trajectory and try again.
Implement MPPI/MPC to track the trajectory.

Answer 1 · 2023-09-09T18:36:52.000Z

Slow down the trajectory

initial value:
A1=0.8 w1=1.5 a1_max = 1.8m/s^2
A2=0.8 w2=3.0 a1_max = 7.2m/s^2

now:
a1_max = 0.45
a2_max.= 1.8

Result

meshcat_1694285145189.tar.mp4

After training more steps:

meshcat_1694285495059.tar.mp4

Conclusion

the tracking error is still relatively large. (10cm)
Need to check dynamics issues #12

Answer 2 · 2023-09-09T19:52:46.000Z

Other's PPO performance

This is the result reported from APG paper:

Makes PPO performance degradation accountable.

Answer 3 · 2023-09-09T20:24:47.000Z

Simple reward engineering

    reward = 0.9 - \
        0.05 * err_vel - \
        err_pos * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 4, 0, 1) * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 8, 0, 1) * 0.2 - \
        jnp.clip(jnp.log(err_pos + 1) * 16, 0, 1) * 0.1 - \
        jnp.clip(jnp.log(err_pos + 1) * 32, 0, 1) * 0.1

meshcat_1694298987491.tar.mp4

Answer 4 · 2023-09-13T12:43:05.000Z

Conclusion

problem resolved by reward engineering.