Farama-Foundation/PettingZoo

[Bug Report] `parallel_to_aec_wrapper` is faulty for array action or continuous action

XuShenLZ opened this issue ยท 8 comments

Describe the bug

If I define my environment with parallel API, the parallel_to_aec_wrapper is forcing the type conversion to int at this line:

action = int(action)

This leads to two issues:

  1. If the action space has space other than (1,), such as (4,), this line would raise an error TypeError: only size-1 arrays can be converted to Python scalars since you are essentially forcing an int conversion to an array
  2. If the action space is continuous, even if the shape is (1,), this int conversion will round all continuous actions into discrete values, which messes up the action design

This bug is pretty serious, and looking forward to suggestions for a temporary workaround and long-term fixes. Although we may manually change this single line locally, we are not sure whether other places also take an scalar discrete action for granted.

Code example

This error will occur whenever you define an environment with parallel API and then let PettingZoo to automatically convert it into AEC environment

from pettingzoo import ParallelEnv
from gymnasium.spaces import Box
from pettingzoo.test.seed_test import parallel_seed_test

import numpy as np

class parallel_env(ParallelEnv):
    def __init__():
        # Some necessary initializations ..............
        self.action_spaces = dict(
            zip(
                self.possible_agents,
                [Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)]
                * self.n_agents,
            )
        )

# Then error would occur when you call anything that involves `parallel_to_aec_wrapper`, such as
parallel_seed_test(parallel_env, num_cycles=10, test_kept_state=True)

System info

  • Pettingzoo was pip installed
  • Version: 1.22.3
  • System: WSL Ubuntu 20.04
  • Python 3.8.10

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo

Thanks a lot for pointing this out, this is most definitely an important bug to fix. I will do some testing to see if removing the int conversion works properly.

Ran it locally and confirmed this is indeed an issue. Code to reproduce:

from pettingzoo.butterfly import pistonball_v6
from pettingzoo.utils.conversions import parallel_to_aec

env = pistonball_v6.parallel_env()
env = parallel_to_aec(env)

env.reset()

while env.agents:
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}  # this is where you would insert your policy
    observations, rewards, terminations, truncations, infos = env.step(actions)

Edit: okay this was my own mistake, I used the parallel code on an AEC converted environment, and got errors due to that. The correct code for this example is as follows:

from pettingzoo.butterfly import pistonball_v6
from pettingzoo.utils.conversions import parallel_to_aec

env = pistonball_v6.parallel_env()
env = parallel_to_aec(env)

env.reset()

for agent in env.agent_iter():
    observation, reward, termination, truncation, info = env.last()

    if termination or truncation:
        action = None
    else:
        action = env.action_space(agent).sample()  # this is where you would insert your policy

    env.step(action)
env.close()

@XuShenLZ feel free to contribute a PR or offer any suggestions if you have any

I wrote a full script expanding on the code in the original post (custom envs need certain methods implemented like reset(), step(), etc):

from __future__ import annotations

from pettingzoo import ParallelEnv
from gymnasium.spaces import Box
from pettingzoo.test.seed_test import seed_test

import numpy as np

from pettingzoo.utils import  agent_selector, parallel_to_aec


class CustomParallelEnv(ParallelEnv):
    metadata = {"is_parallelizable": True}
    def __init__(self):
        # Some necessary initializations ..............
        self.possible_agents = ["player0", "player1"]
        self.agents = self.possible_agents[:]
        self.action_spaces = dict(
            zip(
                self.possible_agents,
                [Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)]
                * self.num_agents,
            )
        )
        self.observation_spaces = dict(
            zip(
                self.possible_agents,
                [Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)]
                * self.num_agents,
            )
        )
        super().__init__()

    def reset(
        self,
        seed: int | None = None,
        options: dict | None = None,
    ):

        self.rewards =  {agent: 0.0 for agent in self.possible_agents}
        self.terminations = {agent: False for agent in self.possible_agents}
        self.truncations = {agent: False for agent in self.possible_agents}
        self.infos = {agent: {} for agent in self.possible_agents}

        self._agent_selector = agent_selector(self.agents)
        self.agent_selection = self._agent_selector.next()

        for space in self.action_spaces:
            self.action_spaces[space].seed(seed)

        for space in self.observation_spaces:
            self.observation_spaces[space].seed(seed)

        obs = self.observe(self.agent_selection)
        info = self.infos
        return obs, info

    def observe(self, agent):
        return {agent: self.observation_space(agent).sample() for agent in self.agents}

    def last(self):
        obs = self.observe(self.agent_selection)
        return obs, self.rewards, self.terminations, self.truncations, self.infos

    def step(self, actions):
        obs = {agent: self.observation_space(agent).sample() for agent in self.agents}

        self.agent_selection = self._agent_selector.next()
        return obs, self.rewards, self.terminations, self.truncations, self.infos

if __name__ == "__main__":
    seed_test(lambda: parallel_to_aec(CustomParallelEnv()))

Runs into the same error referenced in the original post: action = int(action) TypeError: int() argument must be a string, a bytes-like object or a number, not 'dict'

The problem here is that the action which is passed into the parallel_to_aec_wrapper is a dict: {'player0': [ 0.30459853 -0.45622468], 'player1': [-0.95994083 0.33921258]} which obviously can't be converted directly to an integer. Switching the line [Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)] for [gymnasium.spaces.Discrete(5)] results in the same error with it being a dict.

@elliottower I edited the code in the last comment, please check if it's still consistent with what you wanted to do and I'm not missing something stupid.

The changes are that I seeded all the obs-action spaces in reset for the seeding test to actually work; and I changed the parallel_seed_test to regular seed_test, because it should be running on an AEC instead of a parallel env

As to the problem at hand, I'm trying to see if there's any issue whatsoever with just removing that conversion, and I can't find any. That part of the code will be directly user-facing, so users can directly ensure that they cast their actions to int, if that is actually necessary.

Also from lurking the git blame, it seems to have been added last year, in a PR that was only supposed to add pyright type checking. So I'm 99% sure we can just safely remove this, maybe fix a type hint here or there

As to the problem at hand, I'm trying to see if there's any issue whatsoever with just removing that conversion, and I can't find any. That part of the code will be directly user-facing, so users can directly ensure that they cast their actions to int, if that is actually necessary.

Also from lurking the git blame, it seems to have been added last year, in a PR that was only supposed to add pyright type checking. So I'm 99% sure we can just safely remove this, maybe fix a type hint here or there

Sounds good to me

Fixed in #975