Handle full observability
leobix opened this issue · 8 comments
Hi Lucas!
If we use the FullyObsWrapper
on a Minigrid environment then the format of observation_space will go from Dict(image:Box(7, 7, 3))
to Box(19, 19, 3)
. (19 is an example)
In utils/format.py
the get_preprocessor
function first tries if re.match("MiniGrid-.*", env_id)
and assumes that every MiniGrid environment will be partially observable and won't be able to handle a fully observable minigrid environment.
We could just change the order of the if ... and elif ... to make it work, but I am not sure this would be optimal, this is why I prefer opening an issue.
Thanks :)
Hi Leobix,
The code is not assuming any size at all. It takes the size given by the Gym MiniGrid environment observation_space
.
I think the issue is rather coming from the Gym MiniGrid environment where the observation space is always the same, whether the environment is partially or fully observable.
The observation_space
is not the same when using the FullyObsWrapper
, it's a box instead of a dict, and usually larger than (7,7,3). The issue is that the preprocessor checks that the environment name starts with MiniGrid-
to decide what it does with the observation. It should probably check that the observation space is a dict instead.
Okay, I see, sorry for my misunderstanding. But this means that the FullyObsWrapper
observation space doesn't contain any instruction?
Indeed, with the FullyObsWrapper
you don't have the instruction nor the mission anymore in the observation, just the observation tensor.
@leobix If you have some code working, could you write it here? Are you sure it is sufficient to exchange the if and elif?
Edit: Yes it is sufficient. I will commit soon.
@leobix I have committed. Can you tell me if you still have the issue?
I get this error now:
maximecb@T740p:~/Desktop/rl-starter-files$ python3 -m scripts.train --algo ppo --env MiniGrid-Empty-8x8-v0 --model DoorKey --save-interval 10 --frames 8000000
/home/maximecb/Desktop/rl-starter-files/scripts/train.py --algo ppo --env MiniGrid-Empty-8x8-v0 --model DoorKey --save-interval 10 --frames 8000000
Namespace(algo='ppo', batch_size=256, clip_eps=0.2, discount=0.99, entropy_coef=0.01, env='MiniGrid-Empty-8x8-v0', epochs=4, frames=8000000, frames_per_proc=None, gae_lambda=0.95, log_interval=1, lr=0.0007, max_grad_norm=0.5, mem=False, model='DoorKey', optim_alpha=0.99, optim_eps=1e-05, procs=16, recurrence=1, save_interval=10, seed=1, tb=False, text=False, value_loss_coef=0.5)
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/maximecb/Desktop/rl-starter-files/scripts/train.py", line 105, in <module>
obs_space, preprocess_obss = utils.get_obss_preprocessor(args.env, envs[0].observation_space, model_dir)
File "/home/maximecb/Desktop/rl-starter-files/utils/format.py", line 12, in get_obss_preprocessor
print(obs_space.spaces.keys())
AttributeError: 'Box' object has no attribute 'spaces'
@lcswillems to try the FullyObsWrapper, you only need to add two lines to scripts/train.py
:
from gym_minigrid.wrappers import FullyObsWrapper
# Add after gym.make(...)
env = FullyObsWrapper(env)
Then you can test with:
python3 -m scripts.train --algo ppo --env MiniGrid-Empty-8x8-v0 --model DoorKey --save-interval 10 --frames 8000000
Thank you Maxime!
It is fixed now.