๐ Large tracking error with PPO learned policy
jc-bao opened this issue ยท 4 comments
jc-bao commented
jc-bao commented
Slow down the trajectory
initial value:
A1=0.8 w1=1.5 a1_max = 1.8m/s^2
A2=0.8 w2=3.0 a1_max = 7.2m/s^2
now:
a1_max = 0.45
a2_max.= 1.8
meshcat_1694285145189.tar.mp4
After training more steps:
meshcat_1694285495059.tar.mp4
Conclusion
- the tracking error is still relatively large. (10cm)
- Need to check dynamics issues #12
jc-bao commented
jc-bao commented
Simple reward engineering
reward = 0.9 - \
0.05 * err_vel - \
err_pos * 0.4 - \
jnp.clip(jnp.log(err_pos + 1) * 4, 0, 1) * 0.4 - \
jnp.clip(jnp.log(err_pos + 1) * 8, 0, 1) * 0.2 - \
jnp.clip(jnp.log(err_pos + 1) * 16, 0, 1) * 0.1 - \
jnp.clip(jnp.log(err_pos + 1) * 32, 0, 1) * 0.1
meshcat_1694298987491.tar.mp4
jc-bao commented
Conclusion
- problem resolved by reward engineering.