LeCAR-Lab/CoVO-MPC

๐Ÿ› Large tracking error with PPO learned policy

jc-bao opened this issue ยท 4 comments

jc-bao commented

Performance

30 centimeter tracking error is relatively large.

Copy of ppo

meshcat_1694284071493.tar.mp4

Next step

  • Slow down the trajectory and try again.
  • Implement MPPI/MPC to track the trajectory.
jc-bao commented

Slow down the trajectory

initial value:
A1=0.8 w1=1.5 a1_max = 1.8m/s^2
A2=0.8 w2=3.0 a1_max = 7.2m/s^2

now:
a1_max = 0.45
a2_max.= 1.8

Result
ppo

meshcat_1694285145189.tar.mp4

After training more steps:

meshcat_1694285495059.tar.mp4

Copy of Copy of ppo

Conclusion

  • the tracking error is still relatively large. (10cm)
  • Need to check dynamics issues #12
jc-bao commented

Other's PPO performance

This is the result reported from APG paper:

image

Makes PPO performance degradation accountable.

jc-bao commented

Simple reward engineering

image

    reward = 0.9 - \
        0.05 * err_vel - \
        err_pos * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 4, 0, 1) * 0.4 - \
        jnp.clip(jnp.log(err_pos + 1) * 8, 0, 1) * 0.2 - \
        jnp.clip(jnp.log(err_pos + 1) * 16, 0, 1) * 0.1 - \
        jnp.clip(jnp.log(err_pos + 1) * 32, 0, 1) * 0.1

plot
ppo

meshcat_1694298987491.tar.mp4
jc-bao commented

Conclusion

  • problem resolved by reward engineering.