Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT?

Question

Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT?

Xingrun-Xing opened this issue 7 months ago · 2 comments

Answer 1 · 2024-04-26T14:07:17.000Z

I follow the implementation in their code. https://github.com/IST-DASLab/sparsegpt/blob/c3bbf613a1822229767f4d8870b933049b8bef15/sparsegpt.py#L96C21-L96C78

Answer 2 · 2024-04-26T14:31:12.000Z

I follow the implementation in their code. https://github.com/IST-DASLab/sparsegpt/blob/c3bbf613a1822229767f4d8870b933049b8bef15/sparsegpt.py#L96C21-L96C78

Thanks for your reply. But from OBS/OBC/SparseGPT, we know the delta_loss = w^2/(H_ii), instead of w^2/(H_ii)^2. Do you know why should we use w^2/(H_ii)^2 as the importance metric?