[Bug] Training with (n, 1) dimensional Box/MultiDiscrete action spaces throwing error
JennoMai opened this issue · 1 comments
Steps to reproduce
- Wrote a gym environment using a MultiDiscrete action space
- Copied training code from the PETS example (threw error)
- Tried replacing MultiDiscrete action space with a similarly-shaped Box shape (threw the same error)
Observed Results
In the traceback below, I have a length 9 observation space and a length 2 action space; I believe the code might be concatenating the two together, but only a length 1 set of actions is being generated.
Traceback (most recent call last):
File "train_swarm.py", line 170, in <module>
env, obs, agent, {}, replay_buffer)
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/util/common.py", line 570, in step_env_and_add_to_buffer
action = agent.act(obs, **agent_kwargs)
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 650, in act
trajectory_eval_fn, callback=optimizer_callback
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 526, in optimize
callback=callback,
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 134, in optimize
values = obj_fun(population)
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 646, in trajectory_eval_fn
return self.trajectory_eval_fn(obs, action_sequences)
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/planning/trajectory_opt.py", line 710, in trajectory_eval_fn
action_sequences, initial_state=initial_state, num_particles=num_particles
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/model_env.py", line 173, in evaluate_action_sequences
_, rewards, dones, _ = self.step(action_batch, sample=True)
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/model_env.py", line 119, in step
rng=self._rng,
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/one_dim_tr_model.py", line 289, in sample
model_in = self._get_model_input_from_tensors(obs, actions)
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/models/one_dim_tr_model.py", line 128, in _get_model_input_from_tensors
model_in = self.input_normalizer.normalize(model_in).float()
File "/home/jennomai/miniconda3/envs/botnetenv/lib/python3.7/site-packages/mbrl/util/math.py", line 144, in normalize
return (val - self.mean) / self.std
RuntimeError: The size of tensor a (10) must match the size of tensor b (11) at non-singleton dimension 1
Expected Results
This runtime error shouldn't be thrown.
Relevant Code
The gym environment I'm using is very messy right now but can be found here, and the corresponding training code is here.
However, the code depends heavily on the Botnet simulator, so it may be easier to try to replicate using the MultiAgentEnv here?
HI @JennoMai, sorry for the delay, I forgot about this issue.
We haven't in fact experimented with multi-dimensional action spaces, but one thing I can say for sure is that the dynamics model used in PETS won't work out of the box for this. Notice this line. This refers to a 1-D model which is hard-coded to assume that both states and actions tensors are one dimensional, and constructs model inputs by concatenating the two; this is the standard setup for proprioceptive control problems for which PETS and MBPO were initially proposed for.
For your particular application, you can probably use the main PETS skeleton, but you would need to replace the model architecture for something more appropriate to your application. You can take a look at our PlaNet implementation for an example of a different kind of model receiving multi-dimensional (visual) state data, but which uses the same planning algorithm as PETS.