[Bug Report] `parallel_to_aec_wrapper` is faulty for array action or continuous action
XuShenLZ opened this issue ยท 8 comments
Describe the bug
If I define my environment with parallel API, the parallel_to_aec_wrapper
is forcing the type conversion to int
at this line:
PettingZoo/pettingzoo/utils/conversions.py
Line 300 in aff1907
This leads to two issues:
- If the action space has space other than
(1,)
, such as(4,)
, this line would raise an errorTypeError: only size-1 arrays can be converted to Python scalars
since you are essentially forcing anint
conversion to an array - If the action space is continuous, even if the shape is
(1,)
, thisint
conversion will round all continuous actions into discrete values, which messes up the action design
This bug is pretty serious, and looking forward to suggestions for a temporary workaround and long-term fixes. Although we may manually change this single line locally, we are not sure whether other places also take an scalar discrete action for granted.
Code example
This error will occur whenever you define an environment with parallel API and then let PettingZoo to automatically convert it into AEC environment
from pettingzoo import ParallelEnv
from gymnasium.spaces import Box
from pettingzoo.test.seed_test import parallel_seed_test
import numpy as np
class parallel_env(ParallelEnv):
def __init__():
# Some necessary initializations ..............
self.action_spaces = dict(
zip(
self.possible_agents,
[Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)]
* self.n_agents,
)
)
# Then error would occur when you call anything that involves `parallel_to_aec_wrapper`, such as
parallel_seed_test(parallel_env, num_cycles=10, test_kept_state=True)
System info
- Pettingzoo was pip installed
- Version: 1.22.3
- System: WSL Ubuntu 20.04
- Python 3.8.10
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo
Thanks a lot for pointing this out, this is most definitely an important bug to fix. I will do some testing to see if removing the int conversion works properly.
Ran it locally and confirmed this is indeed an issue. Code to reproduce:
from pettingzoo.butterfly import pistonball_v6
from pettingzoo.utils.conversions import parallel_to_aec
env = pistonball_v6.parallel_env()
env = parallel_to_aec(env)
env.reset()
while env.agents:
actions = {agent: env.action_space(agent).sample() for agent in env.agents} # this is where you would insert your policy
observations, rewards, terminations, truncations, infos = env.step(actions)
Edit: okay this was my own mistake, I used the parallel code on an AEC converted environment, and got errors due to that. The correct code for this example is as follows:
from pettingzoo.butterfly import pistonball_v6
from pettingzoo.utils.conversions import parallel_to_aec
env = pistonball_v6.parallel_env()
env = parallel_to_aec(env)
env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
action = None
else:
action = env.action_space(agent).sample() # this is where you would insert your policy
env.step(action)
env.close()
@XuShenLZ feel free to contribute a PR or offer any suggestions if you have any
I wrote a full script expanding on the code in the original post (custom envs need certain methods implemented like reset(), step(), etc):
from __future__ import annotations
from pettingzoo import ParallelEnv
from gymnasium.spaces import Box
from pettingzoo.test.seed_test import seed_test
import numpy as np
from pettingzoo.utils import agent_selector, parallel_to_aec
class CustomParallelEnv(ParallelEnv):
metadata = {"is_parallelizable": True}
def __init__(self):
# Some necessary initializations ..............
self.possible_agents = ["player0", "player1"]
self.agents = self.possible_agents[:]
self.action_spaces = dict(
zip(
self.possible_agents,
[Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)]
* self.num_agents,
)
)
self.observation_spaces = dict(
zip(
self.possible_agents,
[Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)]
* self.num_agents,
)
)
super().__init__()
def reset(
self,
seed: int | None = None,
options: dict | None = None,
):
self.rewards = {agent: 0.0 for agent in self.possible_agents}
self.terminations = {agent: False for agent in self.possible_agents}
self.truncations = {agent: False for agent in self.possible_agents}
self.infos = {agent: {} for agent in self.possible_agents}
self._agent_selector = agent_selector(self.agents)
self.agent_selection = self._agent_selector.next()
for space in self.action_spaces:
self.action_spaces[space].seed(seed)
for space in self.observation_spaces:
self.observation_spaces[space].seed(seed)
obs = self.observe(self.agent_selection)
info = self.infos
return obs, info
def observe(self, agent):
return {agent: self.observation_space(agent).sample() for agent in self.agents}
def last(self):
obs = self.observe(self.agent_selection)
return obs, self.rewards, self.terminations, self.truncations, self.infos
def step(self, actions):
obs = {agent: self.observation_space(agent).sample() for agent in self.agents}
self.agent_selection = self._agent_selector.next()
return obs, self.rewards, self.terminations, self.truncations, self.infos
if __name__ == "__main__":
seed_test(lambda: parallel_to_aec(CustomParallelEnv()))
Runs into the same error referenced in the original post: action = int(action) TypeError: int() argument must be a string, a bytes-like object or a number, not 'dict'
The problem here is that the action which is passed into the parallel_to_aec_wrapper
is a dict: {'player0': [ 0.30459853 -0.45622468], 'player1': [-0.95994083 0.33921258]}
which obviously can't be converted directly to an integer. Switching the line [Box(low=np.array([-1, -0.5]), high=np.array([1, 0.5]), dtype=np.float64)]
for [gymnasium.spaces.Discrete(5)]
results in the same error with it being a dict.
@elliottower I edited the code in the last comment, please check if it's still consistent with what you wanted to do and I'm not missing something stupid.
The changes are that I seeded all the obs-action spaces in reset
for the seeding test to actually work; and I changed the parallel_seed_test
to regular seed_test
, because it should be running on an AEC instead of a parallel env
As to the problem at hand, I'm trying to see if there's any issue whatsoever with just removing that conversion, and I can't find any. That part of the code will be directly user-facing, so users can directly ensure that they cast their actions to int
, if that is actually necessary.
Also from lurking the git blame
, it seems to have been added last year, in a PR that was only supposed to add pyright type checking. So I'm 99% sure we can just safely remove this, maybe fix a type hint here or there
As to the problem at hand, I'm trying to see if there's any issue whatsoever with just removing that conversion, and I can't find any. That part of the code will be directly user-facing, so users can directly ensure that they cast their actions to
int
, if that is actually necessary.Also from lurking the
git blame
, it seems to have been added last year, in a PR that was only supposed to add pyright type checking. So I'm 99% sure we can just safely remove this, maybe fix a type hint here or there
Sounds good to me
Fixed in #975