Is that possible to quantize a locally converted model, instead of downloading from hugging face?
chigkim opened this issue · 1 comments
chigkim commented
I have converted weights using this command.
python -m llama.convert_llama --ckpt_dir ../models --tokenizer_path ../models/tokenizer.model --model_size 7b --output_dir llama-hf
However, I'm trying to quantize the converted model I got from the command above.
If I run the following,
python -m llama.llama_quant c4 --ckpt_dir llama-hf/llama-7b --tokenizer_path llama-hf/tokenizer --wbits 4 --groupsize 128 --save pyllama-7B4b.pt
It's throws an error about missing positional argument.
What should I put before c4 so it uses my model on my hard drive instead of downloading from hugging face?