[Question] Shared feature extractor and gradient
brn-dev opened this issue · 2 comments
❓ Question
Hi, I have two questions:
Firstly, the comment here says the opposite of the code. I am assuming the code is the intended behavior, right?
stable-baselines3/stable_baselines3/common/policies.py
Lines 972 to 975 in 512eea9
Secondly, does this only make sense in off policy settings (when using q-critics)? I checked out the ActorCriticPolicy which is used in on policy algorithms and there, both the actor and the critic loss influence the feature extractors parameter (if shared):
stable-baselines3/stable_baselines3/common/policies.py
Lines 645 to 651 in 512eea9
stable-baselines3/stable_baselines3/common/policies.py
Lines 660 to 672 in 512eea9
stable-baselines3/stable_baselines3/common/torch_layers.py
Lines 252 to 257 in 512eea9
The features generated through the first feature extractor (self.feature_extractor) is passed into the mlp_extractor where it is used to form the actor and value latent features without any disruption to the gradient. So, both the actor and the critic loss influence this feature extractor, whereas in the ContinuousCritic, we prevent the feature extractor from being updated from the critic loss if it is shared with the actor.
Checklist
- I have checked that there is no similar issue in the repo
- I have read the documentation
- If code there is, it is minimal and working
- If code there is, it is formatted using the markdown code blocks for both code and stack traces.
Firstly, the comment here says the opposite of the code.
I think the comment is correct although a bit confusing.
When the feature extractor is shared, you don't won't gradient to be back propagated through the critic, hence set_grad_enabled(not self.share_features_extractor)
.
Secondly, does this only make sense in off policy settings
This choice is made by experience. The recommendation is actually not to share the feature extractor when using off-policy algorithms. Another choice would be to use the critic loss only to learn the feature, but it is not recommended to use both losses when sharing the feature extractor (they can conflict).
ActorCriticPolicy which is used in on policy algorithms
In on-policy setting, where the actor and critic update is done in one-go, it seems (experimentally) that you can use both losses to update the features (and you can also use separate networks).
Makes sense thanks!