Confusion about advantage computation
Closed this issue · 0 comments
gunshi commented
Hey!
I'm a bit confused about why in the code to compute advantages, the previous advantage value is being set to the value of the first env's advantage from the previous time step, ie advantages[i, 0]
(assuming that advantages are structured in dimension/size as (time_steps X num_envs X 1 ))
Line 17 in 15b574f
Could you link the source for the equations for this whole function?
Thanks!
Gunshi