Question about perplexity results shown on the paper
moonlightian opened this issue · 1 comments
moonlightian commented
nailimixaM commented
Thanks @moonlightian, great observation - here's what we said in the main text:
We note that the WikiText2 perplexity of SliceGPT at 50% is worse than SparseGPT
2:4, but the throughput is much higher than could be achieved with a sparse method that does not slice
X. (X = the activations flowing through the transformer)
Structured sparsity methods (like SliceGPT) can not outperform unstructured sparsity all else being equal, but trade-off performance with other benefits like token throughput, memory footprint etc that are not easily achieved in unstructured sparsity methods.