Possible misalignment in calculating rtg in Atari

Question

Possible misalignment in calculating rtg in Atari

Closed this issue 3 years ago · 1 comments

Dear authors,
Thanks for your code! I found a possible error in building rtg in Atari.
I think Line 86 should be curr_traj_returns = stepwise_returns[start_index:i] and Line 88 should be rtg_j = curr_traj_returns[j-start_index:i-start_index].
I'm not 100% sure about this.

Best,
Tao

Answer 1 · 2021-08-25T04:57:31.000Z

I think you're right, thanks for the catch! We'll fix this soon. Fortunately I think it doesn't affect performance much since the first reward in a trajectory is typically zero.