thu-ml/tianshou

Some issues regarding configuration parameters

yshichseu opened this issue · 5 comments

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • [✔ ] I have visited the source website
  • [✔ ] I have searched through the issue tracker for duplicates
  • [ ✔] I have mentioned version numbers, operating system and environment, where applicable:
 import tianshou, gymnasium as gym, torch, numpy, sys
 print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

>>>1.0.0 0.28.1 2.2.2+cpu 1.24.4 3.11.8 | packaged by Anaconda, Inc. | (main, Feb 26 2024, 21:34:05) [MSC v.1916 64 bit (AMD64)] win32

Thank you to the author for providing these concise and easy to implement algorithms, which allowed me to quickly get started with reinforcement learning and meet my own needs. As a beginner in reinforcement learning, I would like to ask some basic questions. I would like to know the specific functions and meanings of the eps test and step per epoch parameters, especially the step per epoch parameter. What I understand is the number of steps for each episode in reinforcement learning, but this number of steps does not dynamically change during the interaction with the environment. Therefore, I am not sure how to set this parameter, or if there are relevant explanatory documents, please feel free to push them to me (sorry, I have not found the specific documents at the moment, thank you!). I hope to receive your answer, thank you!

As far as I understand, step_per_epoch is the total number of agent-environment interactions (obs, action, next_obs, reward, done) within one epoch. One epoch may contain more than one episode (see this tutorial part on training a policy).

eps_test is short for epsilon_test where epsilon is a value in the range [0, 1] that specifies the exploration probability of taking a random action in the DQN algorithm (cf. orig paper and tianshou.highlevel.trainer.EpochTestCallbackDQNSetEps).

Yes, the whole epoch and training related steps thing is currently confusing and needs more documentation, we will add more soon. I generally plan to remove the "epoch" concept from tianshou, as it doesn't have an intrinsic meaning in the RL context. Right now an epoch essentially only affects logging, all other parameters are far more important.

Epochs primarily control the periodicity of performance validation (testing). And since Tianshou will keep the best policy according to these test rollouts, it's somewhat important. The term "epoch" is debatable, but the validation aspect is typically the same for supervised learning algorithms.

For an explanation of epoch semantics, please read the docstrings in SamplingConfig