juncongmoo/pyllama

Is that possible to quantize a locally converted model, instead of downloading from hugging face?

chigkim opened this issue · 1 comments

I have converted weights using this command.

python -m llama.convert_llama --ckpt_dir ../models --tokenizer_path ../models/tokenizer.model --model_size 7b --output_dir llama-hf

However, I'm trying to quantize the converted model I got from the command above.
If I run the following,

python -m llama.llama_quant c4 --ckpt_dir llama-hf/llama-7b --tokenizer_path llama-hf/tokenizer --wbits 4 --groupsize 128 --save pyllama-7B4b.pt

It's throws an error about missing positional argument.

What should I put before c4 so it uses my model on my hard drive instead of downloading from hugging face?

Sorry, duplicate #60.