Question about implementation details

Question

Question about implementation details

AdilZouitine opened this issue 3 years ago · 3 comments

Hey,

First, I want to thank you for your implementation of SAC which is of high quality.

I want to ask you about two details of implementation which I don't understand well.

First, why here instead of clamping the log_std you use this constrain here here? How do you choose 2, -5?

        # constrain log_std inside [log_std_min, log_std_max]
        log_std = torch.tanh(log_std)
        log_std_min, log_std_max = self.log_std_bounds
        log_std = log_std_min + 0.5 * (log_std_max - log_std_min) * (log_std + 1)

Second, why you approximate atanh with 0.5 * (x.log1p() - (-x).log1p()) here instead of torch.atanh() ?

Thank you for your responses 😄

Answer 1 · 2022-02-14T11:39:34.000Z

Hi Adil, 1. This is just to control that varidance doesn't vanish/explore. 2 and -5 are just hyperparameters that I chose by trial and error. 2. IIRC when I implemented this torch.atanh was not available.

…

On Mon, Feb 14, 2022 at 9:54 AM Adil Zouitine ***@***.***> wrote: Hey, First, I want to thank you for your implementation of SAC which is of high quality. I want to ask you about two details of implementation which I don't understand well. First, why here instead of clamping the log_std you use this constrain here here <https://github.com/denisyarats/pytorch_sac/blob/master/agent/actor.py#L77>? How do you choose 2, -5? # constrain log_std inside [log_std_min, log_std_max] log_std = torch.tanh(log_std) log_std_min, log_std_max = self.log_std_bounds log_std = log_std_min + 0.5 * (log_std_max - log_std_min) * (log_std + 1) Second, why you approximate atanh with 0.5 * (x.log1p() - (-x).log1p()) here <https://github.com/denisyarats/pytorch_sac/blob/master/agent/actor.py#L22> instead of torch.atanh() ? Thank you for your responses 😄 — Reply to this email directly, view it on GitHub <#7>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADZYO4DR7RILEQX32E4TAS3U3C6WLANCNFSM5OKX76PQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2022-02-14T12:43:39.000Z

Hi Denis,

Thanks for your reactive answer 😄

Ok, it makes sense because, with my current implementation, it happens every 1m step my log probs explode ... (my current log_std_min is equal to -20), do you observe a lot of instabilities during training? Which trick did you use to avoid this issue? eg, change log std, bound log std with tanh and affine translation instead of clipping, change the implementation of SquashedGaussian
Ok, it makes sense!

Thanks for your time

Answer 3 · 2022-02-16T14:38:50.000Z

I've found my answer