pranz24/pytorch-soft-actor-critic

A question in the deterministic case

roosephu opened this issue · 3 comments

https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L87

Should we here use new_action or self.policy(next_state_batch)?

  1. Correct, I also think we should use self.policy.evaluate(next_state_batch).
  2. I am still using a gaussian policy rather than a deterministic policy + fixed gaussian noise, as given in the paper.
  3. Also you will have to remove the entropy term in the policy loss, i.e., policy_loss = -(expected_new_q_value).mean() (same as DDPG policy loss). This means that we will no longer require the regularization loss.

(Although I have not given your question a lot of thought but these 3 points seemed very clear to me when I read the paper again today. I am very busy at the moment (at least this week). So, if you can give me a week's time then I might get back to you with a bit more information. Also I have no idea why I made these mistakes -_- )

https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L90
I've made some changes according to your query.
Let me know if there is anything else that, you think, is wrong in the implementation.

Nice! Your code is really helpful, thanks!