Typo of actor_critic.py?
seungwonpark opened this issue · 1 comments
Hi, @seungeunrho
Thanks for creating wonderful repo like this.
I think I've spotted a typo on actor_critic.py
, but since I'm new to RL I'm not sure this is a typo or not.
Isn't https://github.com/seungeunrho/minimalRL/blob/master/actor_critic.py#L48 should be using s_prime_lst
instead of s_prime
? In other words,
s_batch, a_batch, r_batch, s_prime_batch, done_batch = torch.tensor(s_lst, dtype=torch.float), torch.tensor(a_lst), \
torch.tensor(r_lst, dtype=torch.float), torch.tensor(s_prime, dtype=torch.float), \
torch.tensor(done_lst, dtype=torch.float)
should be
s_batch, a_batch, r_batch, s_prime_batch, done_batch = torch.tensor(s_lst, dtype=torch.float), torch.tensor(a_lst), \
torch.tensor(r_lst, dtype=torch.float), torch.tensor(s_prime_lst, dtype=torch.float), \
torch.tensor(done_lst, dtype=torch.float)
Both of them works, interestingly.
Oh my god... You are absolutely correct. It was a critical typo.
I revised the error and pushed it!
The reason why the original code works well is due to the small value of "n_rollout".
I originally set it as 5.
During 5 ticks, the agent state maybe does not alter enough to make the value estimate incorrect.
If you modify the n_rollout to 20 and run both versions of the codes,
only the one you suggested works properly.
Thank you again.