bhairavmehta95/data-efficient-hrl

About the implementation of the off policy corrections

Closed this issue · 3 comments

Thanks for providing the code of HIRO.
I‘ve got a question about the implementation of the off-policy corrections function.
What does this comment # TODO: Doesn't include subgoal transitions!! mean, and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal? Thanks for asking :).

1

Thanks for providing the code of HIRO.
I‘ve got a question about the implementation of the off-policy corrections function.
What does this comment # TODO: Doesn't include subgoal transitions!! mean, and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal? Thanks for asking :).

1

I have the same problem with you. Have you solved it?

Oh man - I never even saw this :( I apologize.

What does this comment # TODO: Doesn't include subgoal transitions!! mean,

So when I open sourced this code, I was actually working in a slightly different domain (images, not states) so Equation 2 in this code was never implemented (since Eq2 of subgoal transitions doesn't make sense in a latent space) So, to use this correctly, you'd have to actually transition the subgoal in the main loop.

Screen Shot 2019-06-19 at 3 57 08 PM

and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal?

Nope; Because I didn't add the above, I just added the subgoals directly. But, if we add the above, it works fine.

Oh man - I never even saw this :( I apologize.

What does this comment # TODO: Doesn't include subgoal transitions!! mean,

So when I open sourced this code, I was actually working in a slightly different domain (images, not states) so Equation 2 in this code was never implemented (since Eq2 of subgoal transitions doesn't make sense in a latent space) So, to use this correctly, you'd have to actually transition the subgoal in the main loop.

Screen Shot 2019-06-19 at 3 57 08 PM

and why this function return the subgoal directly, is there anything wrong with the sampling of the candidate goal?

Nope; Because I didn't add the above, I just added the subgoals directly. But, if we add the above, it works fine.

ok, thanks for your comment~