ray-project/ray

[RLlib] Incorrect Callback Order

Closed this issue · 1 comments

What happened + What you expected to happen

In my project, I primarily use the ray.rllib.algorithms.callbacks import DefaultCallbacks class to implement curriculum learning.
In the on_episode_created I set the tasks for the new episode.
It is important for me that the env.reset() method is only called afterwards.
In the code documentation of defaultCallbacks it says that this is exactly what happens.

This method gets called after a new Episode(V2) (old stack) or
SingleAgentEpisode/MultiAgentEpisode instance has been created.
This happens before the respective sub-environment's (usually a gym.Env)
`reset()` is called by RLlib.

However, in SingleAgentEnvRunner._sample_episodes() the reset method is only called after the corresponding callback call

def _sample_episodes(
        self,
        num_episodes: int,
        explore: bool,
        random_actions: bool = False,
    ) -> List[SingleAgentEpisode]:
        """Helper method to run n episodes.

        See docstring of `self.sample()` for more details.
        """
        # If user calls sample(num_timesteps=..) after this, we must reset again
        # at the beginning.
        self._needs_initial_reset = True

        done_episodes_to_return: List[SingleAgentEpisode] = []

        # Reset the environment.
        # TODO (simon): Check, if we need here the seed from the config.
        ################ Reset ################
        obs, infos = self.env.reset()
        episodes = []
        for env_index in range(self.num_envs):
            episodes.append(self._new_episode())
        ################ Callback ################
            self._make_on_episode_callback("on_episode_created", env_index, episodes)
        _shared_data = {}

        for env_index in range(self.num_envs):
            episodes[env_index].add_env_reset(
                observation=obs[env_index],
                infos=infos[env_index],
            )
            self._make_on_episode_callback("on_episode_start", env_index, episodes)
[...]

Versions / Dependencies

python==3.10.13
torch==2.2.0
ray[all]==2.23.0
minigrid==2.3.1
minihack@git+https://github.com/facebookresearch/minihack.git@c535ed1f431d1533f921a4ade8a821f787ba96c0
networkx==3.2.1
shimmy[gym]==0.2.1
gputil==1.4.0
hydra_core==1.3.2
moviepy>=1.0.0
hydra-callbacks==0.5.1

MacOS Ventura 13.6.6 (22G630)

Reproduction script

Issue Severity

High: It blocks me from completing my task.

@MarcSpeckmann Great catch! This should be called before env.reset(). We gonna change this.