Publish the Llama2 sparsified models

Question

Publish the Llama2 sparsified models

Opened this issue 10 months ago · 4 comments

Hi,

I was wondering if you plan to put in a public domain the sparsified Llama2 models. In particular I am interested in the Llama2-70B with 50% unstructured sparsity.

Thanks!

Answer 1 · 2023-12-05T14:15:21.000Z

The size of Llama2-70b is big. I think running our code repo on the llama2 model released on huggingface would take within minutes. Is there a reason for requesting a pruned model from our side?

Answer 2 · 2023-12-05T15:19:55.000Z

The main reason is the resources that would be required for the actual pruning of the largest Llama2-70b model... is it a modern GPU with large memory? Or a DGX box? In either case such resources are in scarcity these days...

Answer 3 · 2023-12-05T21:46:45.000Z

Okay, i see. For LLaMA-2-70B, we used 5 or 6 (the exact number i don't recall) A6000 GPUs to load the model in fp16. There is a workaround if you only have one GPU with limited memory, you can load the model in CPU with fp16. Only when a layer/block is pruned, load it into the GPU. I think this is what SparseGPT did.

I can try to see if it is possible to release the pruned LLaMA-2-70b models. Not sure if there might be some licensing issues. Stay tuned.

Answer 4 · 2023-12-05T23:16:27.000Z

Thanks a lot, please let me know when/if you are able to release the LLaMA-2-70b models.