/quantizeHFmodel

Accepts Hugging Face models, and automatically downloads, quantizes, and reuploads the model to an HF repo of your choice

Primary LanguagePythonApache License 2.0Apache-2.0

quantizeHFmodel

Accepts Hugging Face models, and automatically downloads and quantizes it.

Generates q4_k_m","q5_k_m", "q8_0" by default.

Remember to export your Hugging Face Token like so:

export HUGGING_FACE_HUB_TOKEN="YOUR_TOKEN"

An example of using the script is like:

python3 quantizeHFmodel.py fireworks-ai/firefunction-v1

I'm also hosting quantizeHQQ here - it does the same thing except quantizes with HQQ (https://github.com/mobiusml/hqq), theoretically yielding a better quality quant. However, this takes crazy amounts of VRAM to do, on the order of >100GB. I don't have that, but if this is useful to you, more power to you.