question of paper
Closed this issue · 1 comments
hhhusiyi-monash commented
Hi
Thanks for your awesome work.
I have a question of the paper. I guess the equation 11 miss type a log function with a policy on θ’. Otherwise the equation would be weird as right side of the equation should be the gradient of a constant value which is zero.
Best
C.
Hwhitetooth commented
Hi @hhhusiyi-monash,
Thank you for reaching out!
I think Eq. 11 is correct. Note that the gradient operator only applies to the numerator thus the RHS is not a gradient of a constant. You may want to check out Eq. 3 in the PPO paper (https://arxiv.org/pdf/1707.06347.pdf) for further references.
Best regards,
Zeyu