Moving average weight parameter

Question

Moving average weight parameter

Closed this issue 7 months ago · 1 comments

A question regarding the following code in linear_block.py in the method update_cov:

        ....
        self._activations_cov.add_to_average(activation_cov, cov_ema_decay)
        self._sensitivities_cov.add_to_average(sensitivity_cov, cov_ema_decay)

The method add_to_average(self, value: torch.Tensor, decay: float = 1.0, weight: float = 1.0) takes a weight as parameter.
Shouldn't be the weight set to 1- cov_ema_decay such that the overall sum of decay and weight is 1.0?

This is done in that way in other implementations, like here: https://github.com/lzhangbv/kfac_pytorch/blob/master/kfac/utils.py#L66 or here https://github.com/Thrandis/EKFAC-pytorch/blob/master/kfac.py#L174 (here cov_ema_decay corresponds to alpha).

Answer 1 · 2024-03-26T10:38:16.000Z

I use an implementation of the exponential moving average that is not prone to the initialization problem. Typically, the very first value disproportionately impacts the overall performance. This is not the case in my implementation. The weight here is different than in the implementations you linked. My is similar to the official TB kfac implementation https://github.com/tensorflow/kfac