Support for Vector Envs

Question

Support for Vector Envs

cpnota opened this issue 3 years ago · 4 comments

Answer 1 · 2021-03-22T22:04:14.000Z

@cpnota So I worked on this quite a bit, but after almost finishing I figured out that the Gym vector API is completely incompatible with ALL.

The problem is what happens to the observation during reset.

ALL's parallel experiment (and any sane system) expects this:

O1, r1, d1, i1
O2, r2, d2, i2
...
Ot, rt, dt, it
O1, r1, d1, i1

Unfortunately, the gym vector API is not sane, so it does this:

O1, r1, d1, i1
O2, r2, d2, i2
...
O1, rt, dt, it
O2, r2, d2, i2

Since the environments are interweaved and reset at different times (they autoreset when done), this does not work with ALL's notion of a mask on Ot.

There are a few options:

Ignore the problem. The agent will not train on the first action, only the 2nd and later actions.
I have my own implementations of vector environments in SuperSuit. So in theory, I can just make them compatible with ALL via an argument. But this means that ALL will not have perfect interoperability with gym and stable baseline's vector environments. Not clear that this is a big problem, considering how bad interoperability is between stable baselines and Gym's vector environments right now.

Thoughts?

Answer 2 · 2021-03-22T22:50:35.000Z

Oh I see, instead of giving a terminal "observation", it provides the final reward etc. during the first timestep of the previous episode. In their defense, I see why they did it that way (to avoid masking). But that is sort of frustrating.

Are 1. and 2. incompatible? For most of the current environments, I'm guessing that it won't make really any difference, but it could be nice to have the option. I would probably go with the minimum viable fix for now.

Answer 3 · 2021-03-23T13:52:07.000Z

No, the two options are perfectly compatible. I realized that after sending the last message.

Answer 4 · 2021-07-02T15:47:02.000Z

Closed by #240