EWC fisher matrix: Why not use all of the gradient values in model?
Closed this issue · 4 comments
While I tried to reproduce EWC_train.py
, I found something weird at line #51 (
Line 51 in 4ee53da
I think one should use all of the gradient values in models to construct fisher matrix (which is model.loss_grad.backward()
) but the code isn't written as it.
Hi,
what do you mean by all the gradients in the model?
Andrea
Hi,
I mean there are two types of loss in TRADE model (
Line 103 in 4ee53da
model.loss_grad = loss_ptr + loss_gate
where model.loss_ptr_to_bp = loss_ptr
.
But the fisher matrix only contains model.loss_ptr_to_bp
term, as I mentioned above.
What if I just use model.loss_grad
term instead of model.loss_ptr_to_bp
?
Hi,
yah good point. To be honest, no particular reasons, at the time we implemented we use that loss because we believed was the more important one. But for sure the sum of the two losses can be used. It may also work better.
Thanks for letting us know.
Andrea
Ps. I read your ICLR paper on sequential dialogue, nice work 👍
Also thanks for faster reply :)
Ps. Thanks for reading the paper 😆I also hope to see your PPLM paper presentation🙏🏻