[Question] TD3 algorithm, During training,why limit the next_actions
Closed this issue · 1 comments
Danny551 commented
❓ Question
TD3 algorithm, During training,why limit the next_actions?
If my action range is much larger than [-1,1], the data is truncated
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/td3/td3.py#L171
Checklist
- I have checked that there is no similar issue in the repo
- I have read the documentation
- If code there is, it is minimal and working
- If code there is, it is formatted using the markdown code blocks for both code and stack traces.
araffin commented
Hello,
you can find some answers here why the action space should be normalized: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment
For this piece of code:
it is because we assume the action is normalized (
stable-baselines3/stable_baselines3/common/off_policy_algorithm.py
Lines 400 to 402 in 3d59b5c
and also because the default hyperparameters are tuned for a normalized action space, centered around zero.