relic

Question

relic

Opened this issue a month ago · 2 comments

What happened + What you expected to happen

I am working with a rllib project that using a custom environment that utilizes action masking.

Following training, a checkpoint was created with:
algo.save(checkpoint_dir=pickle_dir).

Subsequent attempt to restore using:
cwd = os.getcwd()
pickle_dir = cwd + f'/ActionMaskingCheckpoint'
algo = Algorithm.from_checkpoint(pickle_dir)

failed with the ultimate error:

File ~/anaconda3/lib/python3.11/site-packages/ray/rllib/examples/rl_modules/classes/action_masking_rlm.py:17, in ActionMaskRLMBase.init(self, config)
15 def init(self, config: RLModuleConfig):
16 if not isinstance(config.observation_space, gym.spaces.Dict):
---> 17 raise ValueError(
18 "This model requires the environment to provide a "
19 "gym.spaces.Dict observation space."
20 )
21 # We need to adjust the observation space for this RL Module so that, when
22 # building the default models, the RLModule does not "see" the action mask but
23 # only the original observation space without the action mask. This tricks it
24 # into building models that are compatible with the original observation space.
25 config.observation_space = config.observation_space["observations"]

ValueError: This model requires the environment to provide a gym.spaces.Dict observation space.

Additionally, earlier in the error stack there are indications that Algorithm is interpreting the import as a multi-agent
case, which is not true.

This occurs with either a user-created masked environment or with the ray-provided example, most likely because the observation returned in a masked environment space is a dictionary structured as
obs = {"action_mask": action_mask,
"observations": original_observation}

Please either fix Algorithm.from_checkpoint to recognize and work with action_masked environments, or provide some
guidance as to how one can manually build a method to do so.

Versions / Dependencies

Ray 3.0.0.dev0
Python 3.11
PyTorch 2.2.1
MacOS 14.4.1 on MacBook Pro with M3 Max

Reproduction script

Copy ray.rllib.examples.action_masking.py to local directory
Add the following before ray.shutdown():
cwd = os.getcwd()
pickle_dir = cwd + f'/ActionMaskingCheckPoint'
if not os.path.exists(pickle_dir):
os.makedirs(pickle_dir)
print (f'Created {pickle_dir} ...')
algo.save(checkpoint_dir=pickle_dir)
Save to local file <local_file>
Run from CL:

python <local_file>
Attempt the following script
from ray.rllib.examples.envs.classes.action_mask_env import ActionMaskEnv
from ray.rllib.algorithms.algorithm import Algorithm
import os
cwd = os.getcwd()
pickle_dir = cwd + f'/ActionMaskingCheckpoint'
algo = Algorithm.from_checkpoint(pickle_dir)

Observe errors

Issue Severity

High: It blocks me from completing my task.

Answer 1 · 2024-05-31T09:13:00.000Z

@jjgriffin2 Thanks for filing this issue and apologies for the trouble. We are moving right now from an old/hybrid (using RLModule and Learner API already) to a new stack and also rewrite the examples. The action masking example is not yet implemented, but on the list to be implemented asap.

Answer 2 · 2024-05-31T23:10:25.000Z

The issue isn't with the example per se but with the underlying code. I cited the example because it readily demonstrates the error. The error itself shows up whenever you try to recreate an algorithm using

algo = Algorithm.from_checkpoint(pickle_dir)

if the checkpoint was created from an algorithm that (successfully) was using masking. Which means that until the underlying Algorithm.from_checkpoint() is fixed, it can't be restored, and must be retrained every single time it is used. Which is extraordinarily time consuming.