thu-ml/tianshou

Support Dict observation spaces

MischaPanch opened this issue · 7 comments

Maybe also action spaces.

I'm not sure what the status of the current support is, and I can't estimate the complexity.

It's probably not a priority, but if an external contributor wants to look into it, we could review this. The solution should come with proper documentation

Looking at how other projects support complex action/observation spaces might be a good start.

Related issues: #1064

I have a environment which has variable observation space, the text sequence. But the batch mechanism seem can't not compatible with the variable returned observation. how can i deal with this?

I suggest to either wrap your environment such that it doesn't have a dict space as observations (this should always be possible), or to work on a PR for solving this issue (might not be easy though).

Not sure what you mean by variable observation. If it's a text sequence as array, Batch can handle it, but you will need a custom Agent for processing text

My environment is a web browser. The returned observation is a text sequence showed in the web page. So the returned text sequence length is always changed at every step. The batch mechanism will put some errors when transfer this kind of observations. I don't think the padding and truncating kind work is necessary to post process the observations.

It always put errors in the setitem function of batch class,
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (2,124)
image
image
As you can see, the obs_next has different kind of shape with obs, so it put an error

I see. It's not really related to this issue, which is about Dict interfaces.

Your environment violates the gym/Gymnasium API, where an env is assumed to have a fixed numerical observation space. In any case, for your model training you do need to process the sequences info arrays of the same length, right?

I suggest you wrap your environment with a Wrapper that does turn it into a gym-like env. Supporting non-gym envs is outside of the scope of tianshou for now, though we might come back to it in a distant future for better supporti of rlhf

Thanks for your advice, I checked the tianshou.data module. Figuring out how to replace the Batch class seems need a lot of time. I will try to do the pad and mask to the input to solve it.

Yes, Batch is pretty fundamental in tianshou and used everywhere ^^