openai/coinrun

batch norm always has is_training = True

Florence-C opened this issue · 2 comments

Hello,

I have two questions regarding batch normalization. In the policy, when applying a batchnorm, the is_training parameter is always set as True.
Why is the batch norm in training mode for both act_model and train_model in ppo ? More precisely, why not setting the batchnorm in test mode when collecting data (with the act model) ?

Second, how is the batchnorm layer applied at test time ? Is it still in training mode ?

Thank you in advance !

This repo only supports batch normalizing based on the statistics of the current batch (which is what you get when passing is_training=True). In practice this works reasonable well for training and testing. However, test performance will slightly increase if you instead normalize based on an average of the statistics of many batches. This is what was done in the paper -- you have to save a moving average of the statistics from training and restore them at test time.

Regarding act_model and train_model, it's important that we normalize with similar statistics in both cases. If our rollouts collect data using different normalization statistics (is_training=False), that will introduce distributional shift (between the rollout policy and the current/training policy) that could make the RL training unstable. I'm not sure how detrimental this would be in practice, but I wouldn't expect it to work well.

Thanks for your answer !