Adjust locations of setting the policy in train/eval mode
Opened this issue · 1 comments
maxhuettenrauch commented
Currently, tianshou sets the policy's mode in the trainer and test_episode
function. The corresponding training
attribute is then used to determine if a stochastic policy should be evaluated deterministically given that policy.deterministic_eval
is True
. This, however, is a misuse as the training
attribute primarily has influence on modules like dropout and batchnorm. It should always be False
during data collection and only be True
inside policy.learn
.
opcode81 commented
Max and I have implemented the following solution in #1123:
- We Introduced a new flag
is_within_training_step
which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether theirdeterministic_eval
setting should indeed apply instead of the torch training flag (which was abused!). - The policy's training/eval mode (which should control torch-level learning only) no longer needs to be set in user code in order to control collector behaviour (this didn't make sense!). The respective calls have been removed.
- The policy should, in fact, always be in evaluation mode when applying data collection, as there is no reason to ever have gradient accumulation enabled for any type of rollout. We thus specifically set the policy to evaluation mode in Collector.collect.