Killed

Question

Killed

javierp183 opened this issue 2 years ago · 6 comments

javierp183 commented 2 years ago

Hello all, I installed the requirements of project but when I try to execute the following command:

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt

I got this message -> "Killed". Could you help me to determinate better the issue and fix. thanks

Answer 1 · 2023-04-03T06:26:08.000Z

I get the same 'Killed' message when I run Single GPU inference without quantization on Linux:

python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH

Answer 2 · 2023-04-04T16:42:55.000Z

I get the same 'Killed' message when I run Single GPU inference without quantization on Linux:

python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH

Hi, I think the problem is the amount of memory usage if you don't apply the quantization of the model.

Answer 3 · 2023-04-09T11:16:41.000Z

I got the same while doing:

python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH --model_size 65B --output_dir ./converted_meta_hf_65 --to hf --max_batch_size 4

[1]    16261 killed     python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path   65B

Answer 4 · 2023-04-09T11:48:42.000Z

Ok, seems like I figured it out. It tries to load the whole model into RAM. I currently have 64Gb (62.7). Had to allocate 70Gb of swap to make it work. 🤣

Also note that you should have enough disk space to do conversion.

Answer 5 · 2023-04-09T21:29:04.000Z

Ok, seems like I figured it out. It tries to load the whole model into RAM. I currently have 64Gb (62.7). Had to allocate 70Gb of swap to make it work. 🤣

Also note that you should have enough disk space to do conversion.

Could you tell ... How. ?thanks

Answer 6 · 2023-04-10T08:45:19.000Z

How to increase swap? Just follow this guide.