hahnyuan/PB-LLM

Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT?

Xingrun-Xing opened this issue · 2 comments

Why the importance of weights evaluated by w^2/(H_ii)^2, instead of w^2/(H_ii) like in SparseGPT?

I follow the implementation in their code. https://github.com/IST-DASLab/sparsegpt/blob/c3bbf613a1822229767f4d8870b933049b8bef15/sparsegpt.py#L96C21-L96C78

Thanks for your reply. But from OBS/OBC/SparseGPT, we know the delta_loss = w^2/(H_ii), instead of w^2/(H_ii)^2. Do you know why should we use w^2/(H_ii)^2 as the importance metric?