horseee/LLM-Pruner

Calculating Importance of 'param_mix'

kiucho opened this issue · 2 comments

kiucho commented

Hello. First of all, thank you for sharing great research.

I have a question about calculating the importance of parameters.

In class TaylorImportance in hf_llama_pruner.py, line 274,

  1. Could you please tell me why the importance about mixed-order is calculated as follow:
salience = salience - 0.5 * layer.weight * layer.weight.acc_grad * layer.weight

(not as the sum of 1st and 2nd orders)

  1. Is higher-order term neglected?

Hi kiucho,

  • For Question 1:

The derivation of this is (e.g., for eq.5 in the paper):
drawing
where
drawing

And thus, for here, it would be subtract the second hessian term from the first-order term.

There is a mistake in the first version of our paper and please refer to our code. We uploaded a new version of paper on arxiv (I'm not sure when it would be released, but I guess it would be available in the next 24 hours).

  • For Question2:

Yes, we can neglect higher-order terms because their impact is negligible due to their small scale compared to the preceding term. This is primarily because the first-order term always dominates, given that the model consistently remains not fully convergence when applied to our calibration samples (evidenced by the presence of a large loss during the pruning process)

kiucho commented

Thank you for your kind explanation. I checked the new version of your paper. Thanks once again.