Question about the Estimator Policy

Question

Question about the Estimator Policy

Closed this issue 9 months ago · 2 comments

Great work and thanks for the opensource!
I notice the code use estimator to predict privileged_states(base_lin_vel) in obs when training the teacher policy. And it isn't used later in training depth student policy and play.py. My understanding is that you still need the estimator to predict the base linear velocity when deploying the policy in reality.
So why don't you choose to use the estimator in depth student training pipeline or in the play.py code? Or perhaps i misunderstand your code.
Really looking forward to your explanation.

Answer 1 · 2023-10-25T23:57:11.000Z

You are right. In play.py the true velocity is used. Thanks for pointing this out. However during training predicted velocity is used in this line.

In save_jit.py the velocity is also predicted from the estimator. So it is also correct for deployment.

Answer 2 · 2023-10-26T07:08:53.000Z

Yeah, now i understand. Thanks again.