question about env reset

Question

question about env reset

lijie9527 opened this issue a year ago · 2 comments

obs, _ = env.reset()
obs = torch.as_tensor(obs, dtype=torch.float32, device=device)
ep_ret, ep_cost, ep_len = (
    np.zeros(args.num_envs),
    np.zeros(args.num_envs),
    np.zeros(args.num_envs),
)
# training loop
for epoch in range(epochs):
    rollout_start_time = time.time()
    # collect samples until we have enough to update
    for steps in range(local_steps_per_epoch):

Why did your code only perform env.reset() at the beginning, rather than starting at each epoch？

Answer 1 · 2023-10-21T11:10:29.000Z

That's because in safepo/common/env.py, we wrap the environment with SafetyAsyncVectorEnv and AutoReset wrapper in Safety-Gymnasium.

def make_sa_mujoco_env(num_envs: int, env_id: str, seed: int|None = None):
    if num_envs > 1:
        # Some code here
        env = SafetyAsyncVectorEnv(env_fns)
    else:
       # Some code here
        env = SafeAutoResetWrapper(env)
       # Some code here
    return env, obs_space, act_space

If you use a single environment, the environment will be reset every episode by AutoReset. If you use a vectorized environment, SafetyAsyncVectorEnv will reset each specific single environment separately.
Additionally, if your custom environment does not support auto reset, please add reset in the level of the algorithms manually.

Answer 2 · 2023-10-22T13:08:06.000Z

Thank you for the answer, I understand.