standardize the advantage function

Question

standardize the advantage function

Closed this issue 7 years ago · 1 comments

        # Standardize the advantage function to have mean=0 and std=1
        advants_n = np.concatenate([path["advants"] for path in paths])
        # advants_n -= advants_n.mean()
        advants_n /= (advants_n.std() + 1e-8)

Is that a typo that you comment up the advants_n -= advants_n.mean()? Cause you said the the mean value of advantage function should be 0.

Answer 1 · 2017-08-16T07:56:43.000Z

I found the normalization in John Schulman's implementation of trpo. It is a way of stabilizing the gradients during back propagation. Yet this normalization will introduce bias, which is why I did not subtract the mean.