There is a typo in N-step ReplayBuffer

Question

There is a typo in N-step ReplayBuffer

mclearning2 opened this issue 5 years ago · 1 comments

In 08.rainbow.ipynb, There is a value, indices never used.

def sample_batch(self) -> Dict[str, np.ndarray]:
        idxs = np.random.choice(self.size, size=self.batch_size, replace=False)
        return dict(
            obs=self.obs_buf[idxs],
            next_obs=self.next_obs_buf[idxs],
            acts=self.acts_buf[idxs],
            rews=self.rews_buf[idxs],
            done=self.done_buf[idxs],
            # for N-step Learning 
            # MC Check This Function is not used
            indices=indices,
        )

Answer 1 · 2019-08-14T05:22:21.000Z

I'll pull request