QwenLM/Qwen-VL

[BUG] <title>两张显卡,各16G,共32.768G显存,推理报显存不足的错误

Opened this issue · 3 comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

两张显卡,各16G,共32.768G显存,因为默认是单卡加载,dev_map改成auto后,可以双卡加载了。单加载完模qwen-vl-chat型后,UI上传图片,输入提示词,提交后又报显存不足的错误。这是为啥呢。我看readme中fp16是28G显存就可以推理啊。
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.89 GiB total capacity; 15.26 GiB already allocated; 11.88 MiB free; 15.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

期望行为 | Expected Behavior

希望能正常识别图片并推理,

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: Ubuntu
- Python:3.10
- Transformers: 4.38.2
- PyTorch: 2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.0

备注 | Anything else?

No response

数据集也耗显存的吧

@whysirier 我注意到一个异常的情况,随着对话轮数的增加,显存占用不断增加,这是正常的情况吗?

@whysirier 我注意到一个异常的情况,随着对话轮数的增加,显存占用不断增加,这是正常的情况吗?

确实会这样,显存不知道如何释放,官方也没给解决方案