Publish the Llama2 sparsified models
Opened this issue · 4 comments
Hi,
I was wondering if you plan to put in a public domain the sparsified Llama2 models. In particular I am interested in the Llama2-70B with 50% unstructured sparsity.
Thanks!
The size of Llama2-70b is big. I think running our code repo on the llama2 model released on huggingface would take within minutes. Is there a reason for requesting a pruned model from our side?
The main reason is the resources that would be required for the actual pruning of the largest Llama2-70b model... is it a modern GPU with large memory? Or a DGX box? In either case such resources are in scarcity these days...
Okay, i see. For LLaMA-2-70B, we used 5 or 6 (the exact number i don't recall) A6000 GPUs to load the model in fp16. There is a workaround if you only have one GPU with limited memory, you can load the model in CPU with fp16. Only when a layer/block is pruned, load it into the GPU. I think this is what SparseGPT did.
I can try to see if it is possible to release the pruned LLaMA-2-70b models. Not sure if there might be some licensing issues. Stay tuned.
Thanks a lot, please let me know when/if you are able to release the LLaMA-2-70b models.