thu-ml/tianshou

Replicating results in collect random operations through seed setting

Closed this issue · 3 comments

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:

Hello, I am working with tianshou=0.5.0 on a Windows 11 machine. I am encountering issues in replicating the result since, as far as I know, there is no seed option implemented in the standard DQN framwork (for instance, in the collector for picking a random action or when samlpling a mini batch from the replaybuffer). I tried to simply set

import numpy as np
  np.random.seed(SEED)

but it seems to not work.

Is there any way this cam be solved in the latest versions of Tianshou or does it need to be implemented?
Here is the code snippet in the library which is not controllable through setting a seed:

# collect method in the Collector class
 # get the next action
          if random:
              try:
                  act_sample = [
                      self._action_space[i].sample() for i in ready_env_ids
                  ]

@maxhuettenrauch @bordeauxred could you look into that pls?

We have already implemented better seeding support, and more is on the way. So either what you need is already possible, or it will be possible very soon.

This is likely a bug in SubprocVectorEnv. If you have only a single env instance, you can resort to using a DummyVectorEnv (action_spaces are properly seeded there) until we push a fix for this problem.

Fixed in #1103