FMInference/FlexLLMGen

Issue with flexgen when running python script

Closed this issue · 2 comments

Description:

I encountered an issue when running the following command at one 3090:

bash:
python3 -m flexgen.flex_opt --model facebook/opt-30b --percent 0 100 100 0 100 0 --num-gpu-batches 2
The error message I received is:

error:
model size: 55.803 GB, cache size: 5.578 GB, hidden size (prefill): 0.058 GB
warmup - init weights
Load the pre-trained pytorch weights of opt-30b from huggingface. The downloading and cpu loading can take dozens of minutes. If it seems to get stuck, you can monitor the progress by checking the memory usage of this process.
Loading checkpoint shards: 43%|█████████████████████████████████████████▏ | 3/7 [03:34<04:48, 72.03s/it]Killed
I am trying to use flexgen to optimize the model size, but the process seems to be getting killed midway through. I am not sure why this is happening, and I would appreciate any help in resolving this issue.

Thank you!

This should be fixed by #69. It is merged into the main branch. Could you try it now?

duplication of #11