ray-project/ray

relic

Opened this issue · 2 comments

What happened + What you expected to happen

I am working with a rllib project that using a custom environment that utilizes action masking.

Following training, a checkpoint was created with:
algo.save(checkpoint_dir=pickle_dir).

Subsequent attempt to restore using:
cwd = os.getcwd()
pickle_dir = cwd + f'/ActionMaskingCheckpoint'
algo = Algorithm.from_checkpoint(pickle_dir)

failed with the ultimate error:

File ~/anaconda3/lib/python3.11/site-packages/ray/rllib/examples/rl_modules/classes/action_masking_rlm.py:17, in ActionMaskRLMBase.init(self, config)
15 def init(self, config: RLModuleConfig):
16 if not isinstance(config.observation_space, gym.spaces.Dict):
---> 17 raise ValueError(
18 "This model requires the environment to provide a "
19 "gym.spaces.Dict observation space."
20 )
21 # We need to adjust the observation space for this RL Module so that, when
22 # building the default models, the RLModule does not "see" the action mask but
23 # only the original observation space without the action mask. This tricks it
24 # into building models that are compatible with the original observation space.
25 config.observation_space = config.observation_space["observations"]

ValueError: This model requires the environment to provide a gym.spaces.Dict observation space.

Additionally, earlier in the error stack there are indications that Algorithm is interpreting the import as a multi-agent
case, which is not true.

This occurs with either a user-created masked environment or with the ray-provided example, most likely because the observation returned in a masked environment space is a dictionary structured as
obs = {"action_mask": action_mask,
"observations": original_observation}

Please either fix Algorithm.from_checkpoint to recognize and work with action_masked environments, or provide some
guidance as to how one can manually build a method to do so.

Versions / Dependencies

Ray 3.0.0.dev0
Python 3.11
PyTorch 2.2.1
MacOS 14.4.1 on MacBook Pro with M3 Max

Reproduction script

  1. Copy ray.rllib.examples.action_masking.py to local directory
  2. Add the following before ray.shutdown():
    cwd = os.getcwd()
    pickle_dir = cwd + f'/ActionMaskingCheckPoint'
    if not os.path.exists(pickle_dir):
    os.makedirs(pickle_dir)
    print (f'Created {pickle_dir} ...')
    algo.save(checkpoint_dir=pickle_dir)
  3. Save to local file <local_file>
  4. Run from CL:

    python <local_file>

  5. Attempt the following script
    from ray.rllib.examples.envs.classes.action_mask_env import ActionMaskEnv
    from ray.rllib.algorithms.algorithm import Algorithm
    import os
    cwd = os.getcwd()
    pickle_dir = cwd + f'/ActionMaskingCheckpoint'
    algo = Algorithm.from_checkpoint(pickle_dir)

Observe errors

Issue Severity

High: It blocks me from completing my task.

@jjgriffin2 Thanks for filing this issue and apologies for the trouble. We are moving right now from an old/hybrid (using RLModule and Learner API already) to a new stack and also rewrite the examples. The action masking example is not yet implemented, but on the list to be implemented asap.

The issue isn't with the example per se but with the underlying code. I cited the example because it readily demonstrates the error. The error itself shows up whenever you try to recreate an algorithm using

algo = Algorithm.from_checkpoint(pickle_dir)

if the checkpoint was created from an algorithm that (successfully) was using masking. Which means that until the underlying Algorithm.from_checkpoint() is fixed, it can't be restored, and must be retrained every single time it is used. Which is extraordinarily time consuming.