Results of LLaMA-2 are different from Wanda
Closed this issue · 2 comments
pprp commented
RocktimJyotiDas commented
Hi, thanks for the question. The numbers are different because, in the Wanda paper, they computed the perplexity for LLaMA-2 using a sequence length of 4096 from Wiki-Text, whereas in GBLM-Pruner, the sequence length is 2048.
pprp commented
Thank you for your answers. 😄