Curt-Park/rainbow-is-all-you-need

There is a typo in N-step ReplayBuffer

mclearning2 opened this issue · 1 comments

In 08.rainbow.ipynb, There is a value, indices never used.

def sample_batch(self) -> Dict[str, np.ndarray]:
        idxs = np.random.choice(self.size, size=self.batch_size, replace=False)
        return dict(
            obs=self.obs_buf[idxs],
            next_obs=self.next_obs_buf[idxs],
            acts=self.acts_buf[idxs],
            rews=self.rews_buf[idxs],
            done=self.done_buf[idxs],
            # for N-step Learning 
            # MC Check This Function is not used
            indices=indices,
        )

I'll pull request