seungeunrho/minimalRL

Maybe a bug in SAC Implementation?

arthur-x opened this issue · 1 comments

a_prime, log_prob = self.forward(s_prime)

The actions are w.r.t. s_prime, however, the Q-values are evaluated for s.

q1_val, q2_val = q1(s,a_prime), q2(s,a_prime)

This doesn't match. Is this a bug?

Oh my, you are definitely right.
Thanks, I updated the code.