[Bug Report] MPE SimpleEnv continuous actions are the "other way"
mrxaxen opened this issue · 3 comments
Describe the bug
At the moment the simple env action.u computations are opposite to the discrete environment's. In the current setup when the agents recieve [0, 1, 0, 1, 0] for example they start moving to the top right, instead of the expected bottom left, based on Agent and adversary action space: [no_action, move_left, move_right, move_down, move_up]
Code example
# simple_env.py from line 206
if self.continuous_actions:
# Process continuous action as in OpenAI MPE
agent.action.u[0] += action[0][1] - action[0][2] # Here
agent.action.u[1] += action[0][3] - action[0][4] # And here
else:
# process discrete action
if action[0] == 1:
agent.action.u[0] = -1.0
if action[0] == 2:
agent.action.u[0] = +1.0
if action[0] == 3:
agent.action.u[1] = -1.0
if action[0] == 4:
agent.action.u[1] = +1.0
System info
No response
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo
Thanks for bringing this up, I'm not very familiar with MPE environments (and they are being moved to a new repository), but could you by any chance tell me exactly how this should be corrected? I can make a PR I just don't know fully which lines correspond to the inverted directions and should be switched.
I am by no means an expert in MPE or that experienced in it for that matter, and open to corrections if i misinterpret the intentions here. However intuitively in this case the action space's interpretation should not change based on whether it is continuous or discrete, or if it changes it should be mentioned in the docs.
Currently if we want to tell an agent to move left in the discrete case, then
action[0] == 1
agent.action.u[0] == -1
Which would make the agent move toward negative x in a standard cartesian coordinate system. However doing the same in the continuous case:
action = [[0, 1, 0, 0, 0]]
agent.action.u[0] += action[0][1] - action[0][2]
which results in:
agent.action.u[0] == +1
So we're basically going the opposite direction, and the same happens with the Y axis.
Either doing
agent.action.u[0] += action[0][2] - action[0][1] # notice the index change
agent.action.u[1] += action[0][4] - action[0][3]
or
agent.action.u[0] -= action[0][1] - action[0][2]
agent.action.u[1] -= action[0][3] - action[0][4]
Should suffice, but the first option might be preferred.
Ok thanks for the explanation