[RLlib] Incorrect Callback Order
Closed this issue · 1 comments
What happened + What you expected to happen
In my project, I primarily use the ray.rllib.algorithms.callbacks import DefaultCallbacks class to implement curriculum learning.
In the on_episode_created I set the tasks for the new episode.
It is important for me that the env.reset() method is only called afterwards.
In the code documentation of defaultCallbacks it says that this is exactly what happens.
This method gets called after a new Episode(V2) (old stack) or
SingleAgentEpisode/MultiAgentEpisode instance has been created.
This happens before the respective sub-environment's (usually a gym.Env)
`reset()` is called by RLlib.
However, in SingleAgentEnvRunner._sample_episodes() the reset method is only called after the corresponding callback call
def _sample_episodes(
self,
num_episodes: int,
explore: bool,
random_actions: bool = False,
) -> List[SingleAgentEpisode]:
"""Helper method to run n episodes.
See docstring of `self.sample()` for more details.
"""
# If user calls sample(num_timesteps=..) after this, we must reset again
# at the beginning.
self._needs_initial_reset = True
done_episodes_to_return: List[SingleAgentEpisode] = []
# Reset the environment.
# TODO (simon): Check, if we need here the seed from the config.
################ Reset ################
obs, infos = self.env.reset()
episodes = []
for env_index in range(self.num_envs):
episodes.append(self._new_episode())
################ Callback ################
self._make_on_episode_callback("on_episode_created", env_index, episodes)
_shared_data = {}
for env_index in range(self.num_envs):
episodes[env_index].add_env_reset(
observation=obs[env_index],
infos=infos[env_index],
)
self._make_on_episode_callback("on_episode_start", env_index, episodes)
[...]
Versions / Dependencies
python==3.10.13
torch==2.2.0
ray[all]==2.23.0
minigrid==2.3.1
minihack@git+https://github.com/facebookresearch/minihack.git@c535ed1f431d1533f921a4ade8a821f787ba96c0
networkx==3.2.1
shimmy[gym]==0.2.1
gputil==1.4.0
hydra_core==1.3.2
moviepy>=1.0.0
hydra-callbacks==0.5.1
MacOS Ventura 13.6.6 (22G630)
Reproduction script
Issue Severity
High: It blocks me from completing my task.
@MarcSpeckmann Great catch! This should be called before env.reset()
. We gonna change this.