Aux loss increases after a couple thousand iterations

Question

Aux loss increases after a couple thousand iterations

makinyilmaz opened this issue 2 years ago · 7 comments

First of all I want to thank you for your precious work !

I have a question about aux loss. I am trying to train a video compression framework where I have a flow compressor and a residual compressor. I initialized optimizer and aux optimizer as you stated under examples directory. I set aux learning rate to 1.e-3 and it first decreases as training continues (I sum flow and residual aux losses). However at some point, it starts to increase. I tried to decrease the aux learning rate to 1.e-5 after enough iterations. Aux loss first decreases but bounces back and starts to increase. Do you have any idea what could cause this observation?

Thanks a lot

Answer 1 · 2022-09-02T05:14:36.000Z

I am in similar problem. The aux loss of EntropyBottleneck is not converge to 0. It takes 10 and converges at 50. Is there any training strategies for the best results?

Thanks for your works.

Answer 2 · 2022-09-02T05:59:23.000Z

Hi,
sorry @makinyilmaz for the late reply. As you saw, the auxiliary loss aims at refining the boundaries between which the range coder will be used, i.e. the boundaries of the CDFs that will be used for the actual entropy coding using a range coder, e.g. rANS. We followed the initial tensorflow compression training strategy which learned both entropy parameters and the rest during the whole training process. The training loop contained in Compressai is a minimalist template trainer, please feel free to try other strategies. We can compare and discuss observations in the Discussion sections.

Answer 3 · 2022-09-02T07:59:40.000Z

Hi @fracape thanks for your reply. I tried applying a seperate gradient clipping on aux_parameters in addition to network parameters and observed the aux loss is converged. Dear @ildmxGL, you may try a similar strategy.

Answer 4 · 2022-12-31T12:47:19.000Z

Hi @fracape thanks for your reply. I tried applying a seperate gradient clipping on aux_parameters in addition to network parameters and observed the aux loss is converged. Dear @ildmxGL, you may try a similar strategy.

Hi @makinyilmaz can you share how to implement the seperate gradient clipping on aux_parameters ?
Here is my code, it does not work.

        out_criterion['loss'].backward()
        torch.nn.utils.clip_grad_norm_((params_dict[n] for n in sorted(parameters)), 1)
        optimizer.step()

        aux_loss = compute_aux_loss(model, accelerator, backward=True)
        torch.nn.utils.clip_grad_norm_((params_dict[n] for n in sorted(aux_parameters)), 1)
        aux_optimizer.step()

Answer 5 · 2022-12-31T21:03:50.000Z

@xkyi1111 My guess is that you may need to clip to a smaller value, e.g. 0.1 or 0.01. Perhaps:

print(norm(aux_loss.grad())) # determine reasonable value

if aux_loss < 100:
    grad_clip(..., reasonable_value)

Also note that beyond a certain point, minimizing aux_loss doesn't have too much effect on entropy coder performance.

Answer 6 · 2023-01-01T06:28:11.000Z

@xkyi1111 My guess is that you may need to clip to a smaller value, e.g. 0.1 or 0.01. Perhaps:
print(norm(aux_loss.grad())) # determine reasonable value

if aux_loss < 100:
    grad_clip(..., reasonable_value)
Also note that beyond a certain point, minimizing aux_loss doesn't have too much effect on entropy coder performance.

@YodaEmbedding thanks for your reply！I have tried to clip it to a smaller value and the result is unchanged. Besides, aux_loss.grad() is None. In my code, after a few thousand iterations, the aux loss and mse loss became nan, which I thought was caused by the increase in aux loss. Is this possible？

Answer 7 · 2023-01-01T08:39:07.000Z

What value of aux_loss do you get after the zeroth, first, and last epochs?

Sorry, I meant to say: try checking the grad norm of the aux_parameters gradients.