[Question] TD3 algorithm， During training，why limit the next_actions

❓ Question

TD3 algorithm， During training，why limit the next_actions？
If my action range is much larger than [-1,1], the data is truncated
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/td3/td3.py#L171

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Hello,

you can find some answers here why the action space should be normalized: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment

For this piece of code:

stable-baselines3/stable_baselines3/td3/td3.py

Line 171 in 3d59b5c

    
           next_actions = (self.actor_target(replay_data.next_observations) + noise).clamp(-1, 1)

it is because we assume the action is normalized (

stable-baselines3/stable_baselines3/common/off_policy_algorithm.py

Lines 400 to 402 in 3d59b5c

    
           # We store the scaled action in the buffer 
        
           buffer_action = scaled_action 
        
           action = self.policy.unscale_action(scaled_action)

)
and also because the default hyperparameters are tuned for a normalized action space, centered around zero.

	# We store the scaled action in the buffer
	buffer_action = scaled_action
	action = self.policy.unscale_action(scaled_action)