[BUG] <title>两张显卡，各16G，共32.768G显存，推理报显存不足的错误

Question

[BUG] <title>两张显卡，各16G，共32.768G显存，推理报显存不足的错误

Opened this issue 2 months ago · 3 comments

zhaofangtao commented 2 months ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

两张显卡，各16G，共32.768G显存，因为默认是单卡加载，dev_map改成auto后，可以双卡加载了。单加载完模qwen-vl-chat型后，UI上传图片，输入提示词，提交后又报显存不足的错误。这是为啥呢。我看readme中fp16是28G显存就可以推理啊。
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 15.89 GiB total capacity; 15.26 GiB already allocated; 11.88 MiB free; 15.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

期望行为 | Expected Behavior

希望能正常识别图片并推理，

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: Ubuntu
- Python:3.10
- Transformers: 4.38.2
- PyTorch: 2.0.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.0

备注 | Anything else?

No response

Answer 1 · 2024-03-28T02:11:37.000Z

数据集也耗显存的吧

Answer 2 · 2024-04-11T02:35:25.000Z

@whysirier 我注意到一个异常的情况，随着对话轮数的增加，显存占用不断增加，这是正常的情况吗？

Answer 3 · 2024-04-11T02:41:44.000Z

@whysirier 我注意到一个异常的情况，随着对话轮数的增加，显存占用不断增加，这是正常的情况吗？

确实会这样，显存不知道如何释放，官方也没给解决方案