Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT?
Xingrun-Xing opened this issue · 2 comments
Xingrun-Xing commented
Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT?
hahnyuan commented
I follow the implementation in their code. https://github.com/IST-DASLab/sparsegpt/blob/c3bbf613a1822229767f4d8870b933049b8bef15/sparsegpt.py#L96C21-L96C78
Xingrun-Xing commented
I follow the implementation in their code. https://github.com/IST-DASLab/sparsegpt/blob/c3bbf613a1822229767f4d8870b933049b8bef15/sparsegpt.py#L96C21-L96C78
Thanks for your reply. But from OBS/OBC/SparseGPT, we know the delta_loss = w^2/(H_ii), instead of w^2/(H_ii)^2. Do you know why should we use w^2/(H_ii)^2 as the importance metric?