denisyarats/pytorch_sac

Question about implementation details

AdilZouitine opened this issue ยท 3 comments

Hey,

First, I want to thank you for your implementation of SAC which is of high quality.

I want to ask you about two details of implementation which I don't understand well.

First, why here instead of clamping the log_std you use this constrain here here? How do you choose 2, -5?

        # constrain log_std inside [log_std_min, log_std_max]
        log_std = torch.tanh(log_std)
        log_std_min, log_std_max = self.log_std_bounds
        log_std = log_std_min + 0.5 * (log_std_max - log_std_min) * (log_std + 1)

Second, why you approximate atanh with 0.5 * (x.log1p() - (-x).log1p()) here instead of torch.atanh() ?

Thank you for your responses ๐Ÿ˜„

Hi Denis,

Thanks for your reactive answer ๐Ÿ˜„

  1. Ok, it makes sense because, with my current implementation, it happens every 1m step my log probs explode ... (my current log_std_min is equal to -20), do you observe a lot of instabilities during training? Which trick did you use to avoid this issue? eg, change log std, bound log std with tanh and affine translation instead of clipping, change the implementation of SquashedGaussian
  2. Ok, it makes sense!

Thanks for your time

I've found my answer