YunzhuLi/InfoGAIL

standardize the advantage function

Closed this issue · 1 comments

in models,py

        # Standardize the advantage function to have mean=0 and std=1
        advants_n = np.concatenate([path["advants"] for path in paths])
        # advants_n -= advants_n.mean()
        advants_n /= (advants_n.std() + 1e-8)

Is that a typo that you comment up the advants_n -= advants_n.mean()? Cause you said the the mean value of advantage function should be 0.

I found the normalization in John Schulman's implementation of trpo. It is a way of stabilizing the gradients during back propagation. Yet this normalization will introduce bias, which is why I did not subtract the mean.