BY571/Soft-Actor-Critic-and-Extensions

understanding alpha learning

Closed this issue · 1 comments

Hi, there,
I am confused about how alpha learning is done here:

alpha_loss = - (self.log_alpha.cpu() * (log_pis.cpu() + self.target_entropy).detach().cpu()).mean()

I thought line 244 here should use alpha instead of self.log_alpha to compute alpha_loss, the dependency goes like: self.log_alpha --> alpha --> alpha_loss, so that ADAM will optimize self.log_alpha automatically for us.

Thanks.

Shuang

BY571 commented

you are right, thanks for mentioning it! i just updated the code