question about env reset
lijie9527 opened this issue · 2 comments
lijie9527 commented
obs, _ = env.reset()
obs = torch.as_tensor(obs, dtype=torch.float32, device=device)
ep_ret, ep_cost, ep_len = (
np.zeros(args.num_envs),
np.zeros(args.num_envs),
np.zeros(args.num_envs),
)
# training loop
for epoch in range(epochs):
rollout_start_time = time.time()
# collect samples until we have enough to update
for steps in range(local_steps_per_epoch):
Why did your code only perform env.reset() at the beginning, rather than starting at each epoch?
Gaiejj commented
That's because in safepo/common/env.py
, we wrap the environment with SafetyAsyncVectorEnv
and AutoReset
wrapper in Safety-Gymnasium.
def make_sa_mujoco_env(num_envs: int, env_id: str, seed: int|None = None):
if num_envs > 1:
# Some code here
env = SafetyAsyncVectorEnv(env_fns)
else:
# Some code here
env = SafeAutoResetWrapper(env)
# Some code here
return env, obs_space, act_space
If you use a single environment, the environment will be reset every episode by AutoReset
. If you use a vectorized environment, SafetyAsyncVectorEnv
will reset each specific single environment separately.
Additionally, if your custom environment does not support auto reset, please add reset in the level of the algorithms manually.
lijie9527 commented
Thank you for the answer, I understand.