A question in the deterministic case

Question

A question in the deterministic case

roosephu opened this issue 6 years ago · 3 comments

https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L87

Should we here use new_action or self.policy(next_state_batch)?

Answer 1 · 2018-11-27T09:49:12.000Z

Correct, I also think we should use self.policy.evaluate(next_state_batch).
I am still using a gaussian policy rather than a deterministic policy + fixed gaussian noise, as given in the paper.
Also you will have to remove the entropy term in the policy loss, i.e., policy_loss = -(expected_new_q_value).mean() (same as DDPG policy loss). This means that we will no longer require the regularization loss.

(Although I have not given your question a lot of thought but these 3 points seemed very clear to me when I read the paper again today. I am very busy at the moment (at least this week). So, if you can give me a week's time then I might get back to you with a bit more information. Also I have no idea why I made these mistakes -_- )

Answer 2 · 2018-12-04T20:47:17.000Z

https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L90
I've made some changes according to your query.
Let me know if there is anything else that, you think, is wrong in the implementation.

Answer 3 · 2018-12-05T23:58:46.000Z

Nice! Your code is really helpful, thanks!