There is a typo in N-step ReplayBuffer
mclearning2 opened this issue · 1 comments
mclearning2 commented
In 08.rainbow.ipynb, There is a value, indices
never used.
def sample_batch(self) -> Dict[str, np.ndarray]:
idxs = np.random.choice(self.size, size=self.batch_size, replace=False)
return dict(
obs=self.obs_buf[idxs],
next_obs=self.next_obs_buf[idxs],
acts=self.acts_buf[idxs],
rews=self.rews_buf[idxs],
done=self.done_buf[idxs],
# for N-step Learning
# MC Check This Function is not used
indices=indices,
)
mclearning2 commented
I'll pull request