Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this

Question

Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this

Howuhh opened this issue 10 months ago · 0 comments

Is your feature request related to a problem? Please describe

Hi, I'm currently developing a library with environments for meta-RL research. In order not to reinvent the wheel, I wanted to use the Jumanji interface (I like it better than gymnax, and Jumanji is more actively maintained), however I've encountered that with the current interface it is extremely difficult or impossible to do so.

In meta-RL we need to be able to adaptively change the environment parameters or the problem generator parameters and we need to do it from outside, because with sampling inside the environment we lose the ability to implement different training curriculums (besides implemented and hardcoded by default). Therefore, you need to be able to pass these parameters when resetting the environment. Gymnax does something similar.

Describe the solution you'd like

It seems to me that it would be enough to change the reset interface to:

reset(self, key: chex.PRNGKey, options: None | EnvOptions = None) -> Tuple[State, TimeStep]

where EnvOptions is arbitrary jit-compitable dataclass. step method can be left as it is, because these options can be stored in State and there is no need to pass it explicitly further. The only important thing is the possibility to change them on reset. Actually, gymnasium also does this (but for different reasons I guess...)

Currently I plan to add this argument in the subclass for my environments, but this will break compatibility with Jumanji wrappers (and in general) for example.

Describe alternatives you've considered

We can not use common meta-RL interface of env.set_task(task_params), as after jitting step and reset methods, this will not have any effect. We also can not give them at initalization, as base Environment class is not jit compatible and should be created once outside the jitted region.

Misc

Check for duplicate requests.