Support of loading of sharded models
Qubitium opened this issue · 2 comments
Qubitium commented
Does BITBLAS currently support loading of sharded models? For quants (4bit) of 70B+ models, hf has max 50GB upload limit for a single file so without sharding, it makes it harder to share quants of very large models. With llama 3.1 405B dropping in the new few hours, we are preparing to upload a bitblas compatible 4bit gptq quant but running into this sharding issue now. Thanks!
LeiWang1999 commented
Thanks for reporting @Qubitium, let me take a look
LeiWang1999 commented
closed ref to ModelCloud/GPTQModel#252 (comment)