Farama-Foundation/D4RL

[Bug Report] maze2d rewards field is off-by-one

Opened this issue · 0 comments

ssnl commented

Describe the bug

In the maze2d-XXXX-v1 datasets, rewards[t] equals whether observations[t] reaches the goal. However, according to MazeEnv code, it should equal to whether next_observations[t], i.e., observations[t+1] reaches the goal. This makes the rewards off-by-1.

In [353]: env = gym.make('maze2d-large-v1')

In [354]: ds = env.get_dataset()
load datafile: 100%|| 8/8 [00:12<00:00,  1.56s/it]

In [355]: env.get_target()
Out[355]: (7, 9)

In [356]: cur_obs_reaches_goal = np.linalg.norm(ds['observations'][:, :2] - env.get_target(), axis=-1) <= 0.5

In [357]: next_obs_reaches_goal = cur_obs_reaches_goal[1:]

In [358]: np.all(cur_obs_reaches_goal.astype(np.float32) == ds['rewards'])
Out[358]: True

In [359]: np.all(next_obs_reaches_goal.astype(np.float32) == ds['rewards'][:-1])  # should be True
Out[359]: False

System Info

  • Reproduced on both MacOS and linux.
  • pip install.

Checklist

  • I have checked that there is no similar issue in the repo (required)