Denys88/rl_games

Entropy calculation for (tanh) transformed normal distribution - SquashedNormal

Closed this issue · 4 comments

I am trying to use the Squashed Normal distribution for training a PPO agent to bound the action space. For the SquashedNormal distribution, entropy is assumed to be equal to entropy of the base (Normal) distribution, which ignores the additional (E[log(d(tanh)/dx)]) term. Would using entropy of the underlying Normal distribution as a proxy (since entropy for the new distribution does not have a closed form) cause any stability issues?

Thank you!

HI @khush3 , in the SAC paper they estimate an entropy term as the mean(-logprob).
https://spinningup.openai.com/en/latest/algorithms/sac.html you can see a few words about. To be honest I tried to understand why they used this equation instead of using true entropy of the distribution (I checked and it didn't work, but it wasn't explained in the paper :D. They also mentioned that if dont use squashed normal training become unstable).
BUT in PPO 99% of my config especially in all IsaacGym configs entropy coefficient is zero for continuous action space. Entropy is used for reporting to the tensorboard only. If everything is fine it should go down.
It should not impact training at all unless you set non zero coef.

Btw in average my tests showed that:
no activation layer (tanh for example) works the best for the mu.
And no activation layer works the best for the sigma. Just treat sigma as log(std).
another option is to use softplus and treat it like std works worse.

If you have example where squashed normal works better I'd like to see it :)

Hi @Denys88,

Apologies for the delayed response. I think that mean(-log(p(x))) makes sense since Entropy (sum -p(x)log(p(x))) can be written as expectation under x~p(x): E[-log(p(x))], i.e., to estimate entropy. I don't think entropy for squashed normal has a closed form, hence directly using entropy of the underlying distribution - Normal distribution here, would be incorrect. Right now, as you suggested, I am avoiding the use of entropy at all for continuous control tasks.

Thank you for sharing the insights from your experiments!

I'll let you know once if have telling results from my tests (e.g., by using squashed normal) for trying to obtain smoother outputs.

@khush3 oh thanks for the -E(-logp) explanation. You are right.
Btw here is closed form derivation for the entropy :
https://math.stackexchange.com/questions/4116762/is-there-a-closed-form-expression-for-entropy-on-tanh-transform-of-gaussian-rand
But you are right there is no good formula.
If take a look at the final formula ( I believe that I is the step function, and inversed tanh is easy to calculate.).
So theoretically we can use it instead of the E[-logp].
To calculate integral part we can try to approx function inside with taylor series.

@khush3 closing this one. feel free to create a new issue if you find any good results.