About the implementation of the off policy corrections

Question

About the implementation of the off policy corrections

Closed this issue 5 years ago · 3 comments

zhkmxx9302013 commented 5 years ago

Thanks for providing the code of HIRO.
I‘ve got a question about the implementation of the off-policy corrections function.
What does this comment # TODO: Doesn't include subgoal transitions!! mean, and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal? Thanks for asking :).

Answer 1 · 2019-05-16T08:01:37.000Z

Thanks for providing the code of HIRO.
I‘ve got a question about the implementation of the off-policy corrections function.
What does this comment # TODO: Doesn't include subgoal transitions!! mean, and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal? Thanks for asking :).

I have the same problem with you. Have you solved it?

Answer 2 · 2019-06-19T19:58:48.000Z

Oh man - I never even saw this :( I apologize.

What does this comment # TODO: Doesn't include subgoal transitions!! mean,

So when I open sourced this code, I was actually working in a slightly different domain (images, not states) so Equation 2 in this code was never implemented (since Eq2 of subgoal transitions doesn't make sense in a latent space) So, to use this correctly, you'd have to actually transition the subgoal in the main loop.

and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal?

Nope; Because I didn't add the above, I just added the subgoals directly. But, if we add the above, it works fine.

Answer 3 · 2019-07-19T07:16:44.000Z

Oh man - I never even saw this :( I apologize.

What does this comment # TODO: Doesn't include subgoal transitions!! mean,

So when I open sourced this code, I was actually working in a slightly different domain (images, not states) so Equation 2 in this code was never implemented (since Eq2 of subgoal transitions doesn't make sense in a latent space) So, to use this correctly, you'd have to actually transition the subgoal in the main loop.

and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal?

Nope; Because I didn't add the above, I just added the subgoals directly. But, if we add the above, it works fine.

ok, thanks for your comment~