seungeunrho/minimalRL

Typo of actor_critic.py?

seungwonpark opened this issue · 1 comments

Hi, @seungeunrho
Thanks for creating wonderful repo like this.

I think I've spotted a typo on actor_critic.py, but since I'm new to RL I'm not sure this is a typo or not.

Isn't https://github.com/seungeunrho/minimalRL/blob/master/actor_critic.py#L48 should be using s_prime_lst instead of s_prime? In other words,

s_batch, a_batch, r_batch, s_prime_batch, done_batch = torch.tensor(s_lst, dtype=torch.float), torch.tensor(a_lst), \
                                                       torch.tensor(r_lst, dtype=torch.float), torch.tensor(s_prime, dtype=torch.float), \
                                                       torch.tensor(done_lst, dtype=torch.float)

should be

s_batch, a_batch, r_batch, s_prime_batch, done_batch = torch.tensor(s_lst, dtype=torch.float), torch.tensor(a_lst), \
                                                       torch.tensor(r_lst, dtype=torch.float), torch.tensor(s_prime_lst, dtype=torch.float), \
                                                       torch.tensor(done_lst, dtype=torch.float)

Both of them works, interestingly.

Oh my god... You are absolutely correct. It was a critical typo.
I revised the error and pushed it!

The reason why the original code works well is due to the small value of "n_rollout".
I originally set it as 5.
During 5 ticks, the agent state maybe does not alter enough to make the value estimate incorrect.

If you modify the n_rollout to 20 and run both versions of the codes,
only the one you suggested works properly.

Thank you again.