p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch

Calculate Entropy Tuning Loss in SAC/SAC Discrete

xingdi-eric-yuan opened this issue · 3 comments

Hi all,

I might have misunderstood, but shouldn't one use self.alpha rather than self.log_alpha HERE?

Thanks.

Same question! Moreover, in the original SAC implementation, the authors use alpha not log alpha

with tf.GradientTape() as tape:
    alpha_losses = -1.0 * (
        self._alpha * tf.stop_gradient(log_pis + self._target_entropy))
    # NOTE(hartikainen): It's important that we take the average here,
    # otherwise we end up effectively having `batch_size` times too
    # large learning rate.
    alpha_loss = tf.nn.compute_average_loss(alpha_losses)

@xingdi-eric-yuan I think I found an explanation from the author of the other implementation:
toshikwa/sac-discrete.pytorch#3 (comment)

Thanks, @Howuhh ! I recently tried comparing between using alpha vs log of alpha in that loss (in a discrete SAC setting), I can confirm that there's nothing noticeable between the agents' performance. Although I don't observe a clear advantage using log alpha (in terms of performance), at least it does not hurt.