locuslab/wanda

Publish the Llama2 sparsified models

Opened this issue · 4 comments

Hi,

I was wondering if you plan to put in a public domain the sparsified Llama2 models. In particular I am interested in the Llama2-70B with 50% unstructured sparsity.

Thanks!

The size of Llama2-70b is big. I think running our code repo on the llama2 model released on huggingface would take within minutes. Is there a reason for requesting a pruned model from our side?

The main reason is the resources that would be required for the actual pruning of the largest Llama2-70b model... is it a modern GPU with large memory? Or a DGX box? In either case such resources are in scarcity these days...

Okay, i see. For LLaMA-2-70B, we used 5 or 6 (the exact number i don't recall) A6000 GPUs to load the model in fp16. There is a workaround if you only have one GPU with limited memory, you can load the model in CPU with fp16. Only when a layer/block is pruned, load it into the GPU. I think this is what SparseGPT did.

I can try to see if it is possible to release the pruned LLaMA-2-70b models. Not sure if there might be some licensing issues. Stay tuned.

Thanks a lot, please let me know when/if you are able to release the LLaMA-2-70b models.