simoninithomas/Deep_reinforcement_learning_Course

N-step returns

slerman12 opened this issue · 0 comments

Do these algorithms compute n-step returns for the reward propagation? The Sonic A2C code looks like it just does 1 step returns V(S) = R(S) + V(S_next), except it's hard to tell because I'm not too familiar with GAE.