Calculate Entropy Tuning Loss in SAC/SAC Discrete
xingdi-eric-yuan opened this issue · 3 comments
xingdi-eric-yuan commented
Hi all,
I might have misunderstood, but shouldn't one use self.alpha
rather than self.log_alpha
HERE?
Thanks.
Howuhh commented
Same question! Moreover, in the original SAC implementation, the authors use alpha not log alpha
with tf.GradientTape() as tape:
alpha_losses = -1.0 * (
self._alpha * tf.stop_gradient(log_pis + self._target_entropy))
# NOTE(hartikainen): It's important that we take the average here,
# otherwise we end up effectively having `batch_size` times too
# large learning rate.
alpha_loss = tf.nn.compute_average_loss(alpha_losses)
Howuhh commented
@xingdi-eric-yuan I think I found an explanation from the author of the other implementation:
toshikwa/sac-discrete.pytorch#3 (comment)
xingdi-eric-yuan commented
Thanks, @Howuhh ! I recently tried comparing between using alpha vs log of alpha in that loss (in a discrete SAC setting), I can confirm that there's nothing noticeable between the agents' performance. Although I don't observe a clear advantage using log alpha (in terms of performance), at least it does not hurt.