Stupid issue

Question

Stupid issue

Opened this issue 6 years ago · 1 comments

Hi

I implemented my own version of sac and the log probability of policy went above 0 sometimes when using the version given in paper.

According to what I read here (Pg6) , I think the squashing correction should be added not subtracted, since the determinant of Jacobian is multiplied when calculating pdf.
But then this incentivises the agent to just set actions to 1 to get low log pi

I am pretty sure I am missing something here. Can you please explain how did you arrive at the squashing correction given in the paper?

Answer 1 · 2018-10-12T21:39:06.000Z

sorry i got confused with very basic things.