Hwhitetooth/lirpg

question of paper

Closed this issue · 1 comments

Hi

Thanks for your awesome work.

I have a question of the paper. I guess the equation 11 miss type a log function with a policy on θ’. Otherwise the equation would be weird as right side of the equation should be the gradient of a constant value which is zero.

Best
C.

Hi @hhhusiyi-monash,

Thank you for reaching out!

I think Eq. 11 is correct. Note that the gradient operator only applies to the numerator thus the RHS is not a gradient of a constant. You may want to check out Eq. 3 in the PPO paper (https://arxiv.org/pdf/1707.06347.pdf) for further references.

Best regards,
Zeyu