EWC fisher matrix: Why not use all of the gradient values in model?

While I tried to reproduce EWC_train.py, I found something weird at line #51 (

Line 51 in 4ee53da

model.loss_ptr_to_bp.backward()

).

I think one should use all of the gradient values in models to construct fisher matrix (which is model.loss_grad.backward()) but the code isn't written as it.

Hi,

what do you mean by all the gradients in the model?

Andrea

Hi,
I mean there are two types of loss in TRADE model (

trade-dst/models/TRADE.py

Line 103 in 4ee53da

self.loss_grad = loss

), which is model.loss_grad = loss_ptr + loss_gate where model.loss_ptr_to_bp = loss_ptr.

But the fisher matrix only contains model.loss_ptr_to_bp term, as I mentioned above.
What if I just use model.loss_grad term instead of model.loss_ptr_to_bp?

Hi,

yah good point. To be honest, no particular reasons, at the time we implemented we use that loss because we believed was the more important one. But for sure the sum of the two losses can be used. It may also work better.

Thanks for letting us know.

Andrea

Ps. I read your ICLR paper on sequential dialogue, nice work 👍

Also thanks for faster reply :)

Ps. Thanks for reading the paper 😆I also hope to see your PPLM paper presentation🙏🏻