What does the hyperparameter "normalize" refer to in PPO?
cboettig opened this issue · 4 comments
PPO hyperparameter configurations often refer to normalize
as a logical, e.g.
rl-baselines3-zoo/hyperparams/ppo.yml
Line 44 in 8ea4f4a
It's not clear to me what configuration this particular hyperparameter refers to, (if anything?) e.g. I see A2C tunes the normalize_advantage
parameter, but that's not a hyperparameter for PPO. PPO has a boolean to normalize_image, but don't think that's it either. Is this controlling whether or not the env gets wrapped in vector normalize?
(For context here -- I've found the zoo scripts here particularly handy for tuning even for my custom environments, thanks! but am struggling to reproduce some of the tuned results by passing the best hyper-parameters directly to fresh initializations of the RL algorithms. Thanks for the amazing work you've done in developing stable-baselines and the zoo!)
Yes, this parameter refers to wrapping environment with VecNormalize, wrapped here.
PS: Thanks for the kind words which make return from the vacations easier :)
It's not clear to me what configuration this particular hyperparameter refers to
Yes, those are parameters to VecNormalize
wrapper.
By default, observation and return normalization are enabled, but you can change that as in
https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/sac.yml#L208
A PR that document this value in the config would be appreciated ;)
Awesome, thanks. So just to be sure I got this right, all algos, not just PPO, use the VecNormalize
wrapper around the environment, with norm_obs
and norm_reward
are True
by default (unless over-ridden in hyperparameters yaml files as shown?) With these parameters listed at
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_normalize.py#L29-L37 ?
Or it looks like some of those parameters are overwritten? e.g. gamma for the normalization is set to whatever gamma the agent is using? Happy to prep a PR, and sorry for all the questions, my earlier attempts to eyeball the source definitely miss-read this (looked to me that the default normalization was False
, a la https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/exp_manager.py#L85). again apologies for being dense
So just to be sure I got this right, all algos, not just PPO, use the VecNormalize wrapper around the environment, with norm_obs and norm_reward are True by default
unless over-ridden in hyperparameters yaml files as shown?
yes
Or it looks like some of those parameters are overwritten? e.g. gamma for the normalization is set to whatever gamma the agent is using?
gamma is the only one we override automatically for correctness (and only if present in the hyperparameters).
We also deactivate reward normalization when evaluating the agent (to have the true reward, even though it is not needed anymore as we recently switched to Monitor
wrapper).
Happy to prep a PR, and sorry for all the questions,
no pb ;)
If you were confused, then you were probably not the only one.