openai/glow

The purpose of the logscale_factor=3. in the actnorm function

kmkolasinski opened this issue · 3 comments

Hello, I would like to ask what is the purpose of the logscale_factor in the actnorm function here?
I couldn't find any reference in the paper which would explain the reason if this variance modification. As far as I understand this implementation, we recover the paper description by setting logscale_factor=1. It is also clear to me that it just affects the initialization step, but it is interesting to know if this is some kind of trick which helped you or something else. Thanks for feedback.

Closing this issue, since I have realized that logscale_factor actually reduces out and does nothing in this code. Sorry for bothering you with this stupid question.

Why are you saying that logscale_factor reduces out? Do you mean that the neural network will learn to adjust to this factor i.e. instead of outputting `logs' it will output a scaled down version of it instead?

The logscale_factor can accelerate the update of the parameter. For example, $\theta$ is the parameter before logscale_factor, $\beta$ is the parameter after logscale_factor and the logscale_factore equals 3.

$$ \begin{align*} \theta&=3\beta\\ \frac{\partial\mathcal{l}(3\beta)}{\partial\beta}&=3\frac{\partial\mathcal{l}(3\beta)}{\partial3\beta}=3\frac{\partial\mathcal{l}(\theta)}{\partial\theta}\\ \theta'&=\theta-\alpha\frac{\partial\mathcal{l}(\theta)}{\partial\theta}\\ 3\beta'&=3\beta-9\alpha\frac{\partial\mathcal{l}(\theta)}{\partial\theta} \end{align*} $$