Killed
javierp183 opened this issue · 6 comments
Hello all, I installed the requirements of project but when I try to execute the following command:
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
I got this message -> "Killed". Could you help me to determinate better the issue and fix. thanks
I get the same 'Killed' message when I run Single GPU inference without quantization on Linux:
python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
I get the same 'Killed' message when I run Single GPU inference without quantization on Linux:
python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
Hi, I think the problem is the amount of memory usage if you don't apply the quantization of the model.
I got the same while doing:
python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH --model_size 65B --output_dir ./converted_meta_hf_65 --to hf --max_batch_size 4
[1] 16261 killed python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path 65B