[Bug Report] maze2d rewards field is off-by-one
Opened this issue · 0 comments
ssnl commented
Describe the bug
In the maze2d-XXXX-v1
datasets, rewards[t]
equals whether observations[t]
reaches the goal. However, according to MazeEnv
code, it should equal to whether next_observations[t]
, i.e., observations[t+1]
reaches the goal. This makes the rewards off-by-1.
In [353]: env = gym.make('maze2d-large-v1')
In [354]: ds = env.get_dataset()
load datafile: 100%|█| 8/8 [00:12<00:00, 1.56s/it]
In [355]: env.get_target()
Out[355]: (7, 9)
In [356]: cur_obs_reaches_goal = np.linalg.norm(ds['observations'][:, :2] - env.get_target(), axis=-1) <= 0.5
In [357]: next_obs_reaches_goal = cur_obs_reaches_goal[1:]
In [358]: np.all(cur_obs_reaches_goal.astype(np.float32) == ds['rewards'])
Out[358]: True
In [359]: np.all(next_obs_reaches_goal.astype(np.float32) == ds['rewards'][:-1]) # should be True
Out[359]: False
System Info
- Reproduced on both MacOS and linux.
- pip install.
Checklist
- I have checked that there is no similar issue in the repo (required)