Question about implementation details
AdilZouitine opened this issue ยท 3 comments
AdilZouitine commented
Hey,
First, I want to thank you for your implementation of SAC which is of high quality.
I want to ask you about two details of implementation which I don't understand well.
First, why here instead of clamping the log_std
you use this constrain here here? How do you choose 2
, -5
?
# constrain log_std inside [log_std_min, log_std_max]
log_std = torch.tanh(log_std)
log_std_min, log_std_max = self.log_std_bounds
log_std = log_std_min + 0.5 * (log_std_max - log_std_min) * (log_std + 1)
Second, why you approximate atanh
with 0.5 * (x.log1p() - (-x).log1p())
here instead of torch.atanh()
?
Thank you for your responses ๐
denisyarats commented
Hi Adil,
1. This is just to control that varidance doesn't vanish/explore. 2 and -5
are just hyperparameters that I chose by trial and error.
2. IIRC when I implemented this torch.atanh was not available.
โฆOn Mon, Feb 14, 2022 at 9:54 AM Adil Zouitine ***@***.***> wrote:
Hey,
First, I want to thank you for your implementation of SAC which is of high
quality.
I want to ask you about two details of implementation which I don't
understand well.
First, why here instead of clamping the log_std you use this constrain
here here
<https://github.com/denisyarats/pytorch_sac/blob/master/agent/actor.py#L77>?
How do you choose 2, -5?
# constrain log_std inside [log_std_min, log_std_max]
log_std = torch.tanh(log_std)
log_std_min, log_std_max = self.log_std_bounds
log_std = log_std_min + 0.5 * (log_std_max - log_std_min) * (log_std + 1)
Second, why you approximate atanh with 0.5 * (x.log1p() - (-x).log1p())
here
<https://github.com/denisyarats/pytorch_sac/blob/master/agent/actor.py#L22>
instead of torch.atanh() ?
Thank you for your responses ๐
โ
Reply to this email directly, view it on GitHub
<#7>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZYO4DR7RILEQX32E4TAS3U3C6WLANCNFSM5OKX76PQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
AdilZouitine commented
Hi Denis,
Thanks for your reactive answer ๐
- Ok, it makes sense because, with my current implementation, it happens every 1m step my log probs explode ... (my current
log_std_min
is equal to -20), do you observe a lot of instabilities during training? Which trick did you use to avoid this issue? eg, change log std, bound log std withtanh
and affine translation instead of clipping, change the implementation of SquashedGaussian - Ok, it makes sense!
Thanks for your time
AdilZouitine commented
I've found my answer