tristandeleu/pytorch-maml-rl

Sync Vec porting

spyroot opened this issue · 0 comments

Hi Tristan,

I need to port some of your code to the new gym API since I'm using the new, much more direct python binding. ( i.e, without mujoco-py). But one part I don't understand is when I moved to the new gym ( I refactored all code use terminated, truncated, etc.)

But I think this observation list creates a bit of a problem. i.e., Basically, when observations = None, upstream code doesn't like it)
I reflected dones, so it checks terminated.

Do you remember the logic for step_wait? Because it looks like you are moving one step in each env?
I'm current gym version they do

observation, info = env.reset()
: ) I keep tracing, but honestly, it is hard to understand because they swap half of the argument's order as well.
concatenate etc.

If you have spare time, I basically just need to understand the logic. If I understood correctly, env was created from the same seed and batch_idx corresponds to each action in each env ?

   def step_wait(self):
        observations_list, infos = [], []
        batch_ids, j = [], 0
        num_actions = len(self._actions)
        rewards = np.zeros((num_actions,), dtype=np.float_)
        for i, env in enumerate(self.envs):
            if self._dones[i]:
                continue

            action = self._actions[j]
            observation, rewards[j], self._dones[i], info = env.step(action)
            batch_ids.append(i)

            if not self._dones[i]:
                observations_list.append(observation)
                infos.append(info)
            j += 1
        assert num_actions == j

        if observations_list:
            observations = create_empty_array(self.single_observation_space,
                                              n=len(observations_list),
                                              fn=np.zeros)
            concatenate(observations_list,
                        observations,
                        self.single_observation_space)
        else:
            observations = None