The purpose of the logscale_factor=3. in the actnorm function

Question

The purpose of the logscale_factor=3. in the actnorm function

kmkolasinski opened this issue 6 years ago · 3 comments

Hello, I would like to ask what is the purpose of the logscale_factor in the actnorm function here?
I couldn't find any reference in the paper which would explain the reason if this variance modification. As far as I understand this implementation, we recover the paper description by setting logscale_factor=1. It is also clear to me that it just affects the initialization step, but it is interesting to know if this is some kind of trick which helped you or something else. Thanks for feedback.

Answer 1 · 2018-10-07T13:48:18.000Z

Closing this issue, since I have realized that logscale_factor actually reduces out and does nothing in this code. Sorry for bothering you with this stupid question.

Answer 2 · 2020-02-21T08:04:49.000Z

Why are you saying that logscale_factor reduces out? Do you mean that the neural network will learn to adjust to this factor i.e. instead of outputting `logs' it will output a scaled down version of it instead?

Answer 3 · 2023-09-16T04:53:55.000Z

The logscale_factor can accelerate the update of the parameter. For example, $\theta$ is the parameter before logscale_factor, $\beta$ is the parameter after logscale_factor and the logscale_factore equals 3.

$$ \begin{align*} \theta&=3\beta\\ \frac{\partial\mathcal{l}(3\beta)}{\partial\beta}&=3\frac{\partial\mathcal{l}(3\beta)}{\partial3\beta}=3\frac{\partial\mathcal{l}(\theta)}{\partial\theta}\\ \theta'&=\theta-\alpha\frac{\partial\mathcal{l}(\theta)}{\partial\theta}\\ 3\beta'&=3\beta-9\alpha\frac{\partial\mathcal{l}(\theta)}{\partial\theta} \end{align*} $$