Farama-Foundation/PettingZoo

[Bug Report] MPE SimpleEnv continuous actions are the "other way"

mrxaxen opened this issue · 3 comments

Describe the bug

At the moment the simple env action.u computations are opposite to the discrete environment's. In the current setup when the agents recieve [0, 1, 0, 1, 0] for example they start moving to the top right, instead of the expected bottom left, based on Agent and adversary action space: [no_action, move_left, move_right, move_down, move_up]

Code example

# simple_env.py from line 206
if self.continuous_actions:
                # Process continuous action as in OpenAI MPE
                agent.action.u[0] += action[0][1] - action[0][2] # Here
                agent.action.u[1] += action[0][3] - action[0][4] # And here
            else:
                # process discrete action
                if action[0] == 1:
                    agent.action.u[0] = -1.0
                if action[0] == 2:
                    agent.action.u[0] = +1.0
                if action[0] == 3:
                    agent.action.u[1] = -1.0
                if action[0] == 4:
                    agent.action.u[1] = +1.0

System info

No response

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo

Thanks for bringing this up, I'm not very familiar with MPE environments (and they are being moved to a new repository), but could you by any chance tell me exactly how this should be corrected? I can make a PR I just don't know fully which lines correspond to the inverted directions and should be switched.

I am by no means an expert in MPE or that experienced in it for that matter, and open to corrections if i misinterpret the intentions here. However intuitively in this case the action space's interpretation should not change based on whether it is continuous or discrete, or if it changes it should be mentioned in the docs.
Currently if we want to tell an agent to move left in the discrete case, then
action[0] == 1
agent.action.u[0] == -1
Which would make the agent move toward negative x in a standard cartesian coordinate system. However doing the same in the continuous case:
action = [[0, 1, 0, 0, 0]]
agent.action.u[0] += action[0][1] - action[0][2]
which results in:
agent.action.u[0] == +1
So we're basically going the opposite direction, and the same happens with the Y axis.
Either doing

agent.action.u[0] += action[0][2] - action[0][1] # notice the index change
agent.action.u[1] += action[0][4] - action[0][3]

or

agent.action.u[0] -= action[0][1] - action[0][2]
agent.action.u[1] -= action[0][3] - action[0][4]

Should suffice, but the first option might be preferred.

Ok thanks for the explanation