[Question] Shared feature extractor and gradient

❓ Question

Hi, I have two questions:

Firstly, the comment here says the opposite of the code. I am assuming the code is the intended behavior, right?

stable-baselines3/stable_baselines3/common/policies.py

Lines 972 to 975 in 512eea9

    
           # Learn the features extractor using the policy loss only 
        
           # when the features_extractor is shared with the actor 
        
           with th.set_grad_enabled(not self.share_features_extractor): 
        
               features = self.extract_features(obs, self.features_extractor)

Secondly, does this only make sense in off policy settings (when using q-critics)? I checked out the ActorCriticPolicy which is used in on policy algorithms and there, both the actor and the critic loss influence the feature extractors parameter (if shared):

stable-baselines3/stable_baselines3/common/policies.py

Lines 645 to 651 in 512eea9

    
           features = self.extract_features(obs) 
        
           if self.share_features_extractor: 
        
               latent_pi, latent_vf = self.mlp_extractor(features) 
        
           else: 
        
               pi_features, vf_features = features 
        
               latent_pi = self.mlp_extractor.forward_actor(pi_features) 
        
               latent_vf = self.mlp_extractor.forward_critic(vf_features)

stable-baselines3/stable_baselines3/common/policies.py

Lines 660 to 672 in 512eea9

    
               def extract_features(  # type: ignore[override] 
        
                   self, obs: PyTorchObs, features_extractor: Optional[BaseFeaturesExtractor] = None 
        
               ) -> Union[th.Tensor, Tuple[th.Tensor, th.Tensor]]: 
        
                   """ 
        
                   Preprocess the observation if needed and extract features. 
        
                   :param obs: Observation 
        
                   :param features_extractor: The features extractor to use. If None, then ``self.features_extractor`` is used. 
        
                   :return: The extracted features. If features extractor is not shared, returns a tuple with the 
        
                       features for the actor and the features for the critic. 
        
                   """ 
        
                   if self.share_features_extractor: 
        
                       return super().extract_features(obs, self.features_extractor if features_extractor is None else features_extractor)

stable-baselines3/stable_baselines3/common/torch_layers.py

Lines 252 to 257 in 512eea9

    
               def forward(self, features: th.Tensor) -> Tuple[th.Tensor, th.Tensor]: 
        
                   """ 
        
                   :return: latent_policy, latent_value of the specified network. 
        
                       If all layers are shared, then ``latent_policy == latent_value`` 
        
                   """ 
        
                   return self.forward_actor(features), self.forward_critic(features)

The features generated through the first feature extractor (self.feature_extractor) is passed into the mlp_extractor where it is used to form the actor and value latent features without any disruption to the gradient. So, both the actor and the critic loss influence this feature extractor, whereas in the ContinuousCritic, we prevent the feature extractor from being updated from the critic loss if it is shared with the actor.

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Firstly, the comment here says the opposite of the code.

I think the comment is correct although a bit confusing.
When the feature extractor is shared, you don't won't gradient to be back propagated through the critic, hence set_grad_enabled(not self.share_features_extractor).

Secondly, does this only make sense in off policy settings

This choice is made by experience. The recommendation is actually not to share the feature extractor when using off-policy algorithms. Another choice would be to use the critic loss only to learn the feature, but it is not recommended to use both losses when sharing the feature extractor (they can conflict).

ActorCriticPolicy which is used in on policy algorithms

In on-policy setting, where the actor and critic update is done in one-go, it seems (experimentally) that you can use both losses to update the features (and you can also use separate networks).

Makes sense thanks!

	# Learn the features extractor using the policy loss only
	# when the features_extractor is shared with the actor
	with th.set_grad_enabled(not self.share_features_extractor):
	features = self.extract_features(obs, self.features_extractor)

	features = self.extract_features(obs)
	if self.share_features_extractor:
	latent_pi, latent_vf = self.mlp_extractor(features)
	else:
	pi_features, vf_features = features
	latent_pi = self.mlp_extractor.forward_actor(pi_features)
	latent_vf = self.mlp_extractor.forward_critic(vf_features)

	def extract_features( # type: ignore[override]
	self, obs: PyTorchObs, features_extractor: Optional[BaseFeaturesExtractor] = None
	) -> Union[th.Tensor, Tuple[th.Tensor, th.Tensor]]:
	"""
	Preprocess the observation if needed and extract features.

	:param obs: Observation
	:param features_extractor: The features extractor to use. If None, then ``self.features_extractor`` is used.
	:return: The extracted features. If features extractor is not shared, returns a tuple with the
	features for the actor and the features for the critic.
	"""
	if self.share_features_extractor:
	return super().extract_features(obs, self.features_extractor if features_extractor is None else features_extractor)

	def forward(self, features: th.Tensor) -> Tuple[th.Tensor, th.Tensor]:
	"""
	:return: latent_policy, latent_value of the specified network.
	If all layers are shared, then ``latent_policy == latent_value``
	"""
	return self.forward_actor(features), self.forward_critic(features)