data recording and saving method

Question

data recording and saving method

Xiong5Heng opened this issue 3 months ago · 4 comments

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- design request (i.e. "X should be changed to Y.")
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

Hi,

When I use SubprocVectorEnv, I want to record the rewards from all environments. Do you have similar function just like VecMonitor in SB3 (https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#stable_baselines3.common.vec_env.VecMonitor)?

Answer 1 · 2024-03-25T14:17:27.000Z

I would suggest using an environment wrapper for that. At the moment tianshou is primarily an algorithm library, not focused on wrappers. In fact, you could just use the wrapper from SB3 together with tianshou.

Let me know if this answers your question

Answer 2 · 2024-03-26T02:29:30.000Z

Hi,
Thanks for your reply, and I will try your solution.
But, if I do not use the wrapper from SB3, are there any other ways to record the rewards from all vector environments?

Answer 3 · 2024-03-26T18:03:37.000Z

The best way would be to use an env wrapper. Note that in all examples you can create your own env factory with your own wrapper. I'll try to add a tutorial on how to do that soon.

Apart from that, you can probably use a custom logger. You can also access the buffer directly during training through the trainer, all rewards are saved there.

In very near future we will add support for callbacks during training, which then would provide the simplest way for saving custom data (see #977 #895)

Answer 4 · 2024-03-28T03:36:21.000Z

The best way would be to use an env wrapper. Note that in all examples you can create your own env factory with your own wrapper. I'll try to add a tutorial on how to do that soon.

Apart from that, you can probably use a custom logger. You can also access the buffer directly during training through the trainer, all rewards are saved there.

In very near future we will add support for callbacks during training, which then would provide the simplest way for saving custom data (see #977 #895)

Thanks for your brilliant work!