How can the pruned model with sparse matrix save model size and computation cost?

Question

How can the pruned model with sparse matrix save model size and computation cost?

JiachuanDENG opened this issue a year ago · 1 comments

Thank you so much for sharing this work and code!
But from my understanding, Wanda is actually not decreasing the number of parameter of the model, instead, it simply set a lot of parameter to 0.

In this way, the total number of parameter remains the same as the original model. So the model size will be the same. I checked this by print out the #param of the pruned model and original model, the #param looks identical
Although a lot of parameters are set to 0, but if no special optimization is designed for the forward pass, 0 is still involved in the multiplicaiton and addition operation. So the computation cost will be the same.
Am I understanding right?

Answer 1 · 2023-06-28T09:38:23.000Z

Yes, you are right. This is how most researchers implement unstructured pruning methods (you can think of it as similar to how people implement magnitude pruning in practice). Structured N:M sparsity however can deliver practical speedup.