slerman12 opened this issue 6 years ago · 0 comments
Do these algorithms compute n-step returns for the reward propagation? The Sonic A2C code looks like it just does 1 step returns V(S) = R(S) + V(S_next), except it's hard to tell because I'm not too familiar with GAE.