pat-coady/trpo

Scaler vs. BatchNorm

pender opened this issue · 3 comments

Hi @pat-coady -- I was wondering why you use a custom Scaler python object instead of standard batchnorm (e.g. the tensorflow kind)? Wouldn't sticking a batchnorm layer onto the front of the policy net achieve the same thing, require less code and be compatible with TF savers? Sorry if I am misunderstanding!

@pender
Good question. The idea of these scalers is that they change very slowly from batch to batch. And, by 1M episodes, they hardly budge at all from episode-to-episdoe. If you learn that a certain observation + action vector leads to poor rewards, you don't want batch norm to scale this vector to a different spot in future episodes depending on what else is in the batch.

Also reinforcement learning agents are not trained with a stationary distribution. As the agent learns, they explore different areas and stop visiting other areas.

I wish I had documented it, but I did try normalization on a per batch or even per 10 batch basis. The learning performance was awful. All that said, I didn't implement batch normalization and I don't want to discourage you from trying it.

Interesting, thanks for the explanation. I wonder if you could achieve the right level of flexibility in the rolling average of TF BatchNorm layer by adjusting the momentum argument ... maybe I'll give it a shot. Thanks again.

@pender

That'd be cool if you gave it a shot. Looking at my notes, the way I normalized when I tried was per feature. I'm curious how true batch norm does.