Eclectic-Sheep/sheeprl

Last `N` actions as `mlp_keys` encoder input for `dreamer_v3`

Closed this issue · 10 comments

Hi,

Working on an Atari environment wrapper with action input buffer with len=N that I want to feed as input to mlp_keys.
Algo config:

algo:
  mlp_keys:
    encoder: [actions]

However, unable to get it working, getting error TypeError: object of type 'NoneType' has no len() at

File "/home/sam/dev/ml/sheeprl/sheeprl/utils/env.py", line 171, in <listcomp>
    [k for k in env.observation_space.spaces.keys() if len(env.observation_space[k].shape) in {2, 3}]

Because gym.spaces.Tuple has no member shape.

Wondering what should change in this wrapper so it correctly interfaces with what sheeprl expects? Would there be a way to augment Tuple to have a shape, or should it change to a Box? If needed to be Box, what should be its config?

class InputBufferWtihActionsAsInput_Atari(gym.Wrapper):
    def __init__(self, env: gym.Env, input_buffer_amount: int = 0):
        super().__init__(env)
        if input_buffer_amount <= 0:
            raise ValueError("`amount` should be a positive integer")
        self._input_buffer_amount = input_buffer_amount
        self._input_buf = deque(maxlen=input_buffer_amount)
        self.observation_space = gym.spaces.Dict({
                "rgb": self.env.observation_space,
                "actions": gym.spaces.Tuple([self.env.action_space] * input_buffer_amount)
            })
        
    def get_obs(self, observation):
        return {
            "rgb": observation,
            "actions": self._input_buf
        }

    def reset(self, **kwargs):
        obs, infos = super().reset(**kwargs)
        
        while len(self._input_buf) < self._input_buf.maxlen:
            self._input_buf.append(self.env.action_space.sample())
        
        return self.get_obs(obs), infos
  
    def step(self, action):        
        this_frame_action = self._input_buf[0]
        self._input_buf.append(action)
        
        obs, reward, done, truncated, infos = self.env.step(this_frame_action)

        return self.get_obs(obs), reward, done, truncated, infos

Edit:
I have a working setup using hard-coded, implementation details-aware wrapper using stuff like this. Still wondering how to achieve generic solution though.

        self.observation_space = gym.spaces.Dict({
                "rgb": self.env.observation_space,
                #"last_action": self.env.action_space
                #"actions": gym.spaces.Box(shape=(self.env.action_space.shape, input_buffer_amount), dtype=np.int64)
                #"actions": gym.spaces.Box([self.env.action_space] * input_buffer_amount)
                "actions_0": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_1": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_2": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
                "actions_3": gym.spaces.Box(low=0, high=8, shape=(1,), dtype=np.int64),
            })

    def get_obs(self, observation: Any) -> Any:
        #observation['past_actions'] = spaces.Space(list(self._input_buf))
        return {
            "rgb": observation,
            #"last_action": self._input_buf[0]
            #"actions": np.array(self._input_buf, dtype=np.int64)
            "actions_0": self._input_buf[0],
            "actions_1": self._input_buf[1],
            "actions_2": self._input_buf[2],
            "actions_3": self._input_buf[3],
        }

Hi @geranim0,
yes, the observation space must have the shape attribute. I suggest to use the gymnasium.spaces.Box space to augment the observations of the environment.
I prepared a branch with the ActionsAsObservationWrapper that allows you to add last n actions: https://github.com/Eclectic-Sheep/sheeprl/tree/feature/actions-as-obs.
You can specify the number of actions in the env.action_stack parameter. You can also add a dilation between actions (like in the FrameStack), you can set the dilation with the env.action_stack_dilation parameter in the configs.

The key is "action_stack", otherwise it creates conflicts during training (add it to the mlp_keys).

Let me know if it works

Note: Discrete actions are converted into one-hot actions (as the agent works with one-hot actions in the discrete case). We can discuss which is the best option.

cc @belerico

Hi @michele,

Thanks for the branch! Taking a look and doing some tests with it.

So, did some testing, here are the results

image

Where the gray line represents the agent trained with the last N (in this case, 12) actions added to the observations, and the blue line represents the agent trained with the same input buffer (12), without the input buffer as observation. Only 1 run was made for each, but it looks like in the presence of a large input buffer, adding the input buffer as observations is helpful.

It also suggests that the wrapper works 👍

Only modification I made to your branch was add an input buffer to the wrapper.

Great, I'm glad it works.
I do not understand why you added the input buffer and how you used it. Can you show me which modification you made?
Thanks

Sure, actually it is in my first message, in the step function. Instead of using this frame action, I use the one ready for use in the buffer with this_frame_action = self._input_buf[0].

The purpose of this is to simulate human reaction time. That's why I wanted to test adding the input buffer to the observation, to see if it would improve performance (looks like it does).

Understood, thanks

Hi @geranim0, if this is done we can add this feature in a new PR and put it in the next release

Hi @belerico, sure!

Side note though, in tests using Discrete action space, things worked fine, but encountered some problems with the action shape not being handled with MultiDiscrete envs for the action-as-obs wrapper and also dreamer_v3.py::main() with this portion

  real_actions = (
      torch.cat([real_act.argmax(dim=-1) for real_act in real_actions], dim=-1).cpu().numpy()
  )
  step_data["actions"] = actions.reshape((1, cfg.env.num_envs, -1))

For now got around it by reshaping my action space to Discrete. Kind of using an old branch, will re-test when updating.

Hi @geranim0,
can you share the error you encountered and which environment you are using?
Thanks

I should have fixed the problem, could you check with the multidiscrete action space?
Thanks