juncongmoo/pyllama

Quantize Original LLaMA Model Files

htcml opened this issue · 3 comments

htcml commented

A bit confused here. In README.md, users are asked to donwload LLaMA model files first. Then quantize examples use decapoda-research/llama-7b-hf. How to quantize the downloaded LLaMA model files(for example, consolidated.00.pth for 7B)?

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt

Replace "decapoda-research/llama-7b-hf" with the path of a hf model. Maybe you need to convert it first.

Use the following to convert:

python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH --model_size 7B --output_dir ./converted_meta --to hf --max_batch_size 4

Note that it creates a weird directory structure as I had issues with locating a tokenizer during quantization. So after conversion I renamed llama-7b to 7B and made:

cp -rf ./converted_meta/tokenizer/* ./converted_meta/7B/

Then run:

python3 -m llama.llama_quant ./converted_meta/7B c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt
kruzel commented

using the recommended convert with downloaded checkpoints fail on TypeError: 'NoneType' object is not subscriptable.
any idea?

(venv) oferk@ironman:~/git/pyllama$ python3 -m llama.convert_llama --ckpt_dir pyllama_data/ --tokenizer_path pyllama_data/tokenizer.model --model_size 7B --output_dir ./converted_meta --to hf --max_batch_size 4

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/oferk/git/pyllama/venv/lib/python3.10/site-packages/llama/convert_llama.py", line 377, in
convert_llama_hf(args)
File "/home/oferk/git/pyllama/venv/lib/python3.10/site-packages/llama/convert_llama.py", line 340, in convert_llama_hf
write_model(
File "/home/oferk/git/pyllama/venv/lib/python3.10/site-packages/llama/convert_llama.py", line 64, in write_model
n_layers = params["n_layers"]
TypeError: 'NoneType' object is not subscriptable

downloaded folder structure
image