DLR-RM/stable-baselines3

[Question] TD3 algorithm, During training,why limit the next_actions

Closed this issue · 1 comments

❓ Question

TD3 algorithm, During training,why limit the next_actions?
If my action range is much larger than [-1,1], the data is truncated
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/td3/td3.py#L171

Checklist

Hello,

you can find some answers here why the action space should be normalized: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment

For this piece of code:

next_actions = (self.actor_target(replay_data.next_observations) + noise).clamp(-1, 1)

it is because we assume the action is normalized (
# We store the scaled action in the buffer
buffer_action = scaled_action
action = self.policy.unscale_action(scaled_action)
)
and also because the default hyperparameters are tuned for a normalized action space, centered around zero.