Sync Vec porting
spyroot opened this issue · 0 comments
Hi Tristan,
I need to port some of your code to the new gym API since I'm using the new, much more direct python binding. ( i.e, without mujoco-py). But one part I don't understand is when I moved to the new gym ( I refactored all code use terminated, truncated, etc.)
But I think this observation list creates a bit of a problem. i.e., Basically, when observations = None, upstream code doesn't like it)
I reflected dones, so it checks terminated.
Do you remember the logic for step_wait? Because it looks like you are moving one step in each env?
I'm current gym version they do
observation, info = env.reset()
: ) I keep tracing, but honestly, it is hard to understand because they swap half of the argument's order as well.
concatenate etc.
If you have spare time, I basically just need to understand the logic. If I understood correctly, env was created from the same seed and batch_idx corresponds to each action in each env ?
def step_wait(self):
observations_list, infos = [], []
batch_ids, j = [], 0
num_actions = len(self._actions)
rewards = np.zeros((num_actions,), dtype=np.float_)
for i, env in enumerate(self.envs):
if self._dones[i]:
continue
action = self._actions[j]
observation, rewards[j], self._dones[i], info = env.step(action)
batch_ids.append(i)
if not self._dones[i]:
observations_list.append(observation)
infos.append(info)
j += 1
assert num_actions == j
if observations_list:
observations = create_empty_array(self.single_observation_space,
n=len(observations_list),
fn=np.zeros)
concatenate(observations_list,
observations,
self.single_observation_space)
else:
observations = None