dvlab-research/LongLoRA

torch.cuda.OutOfMemoryError: CUDA out of memory

zhanglv0209 opened this issue · 3 comments

你好, /mnt/nvme1n1/zhang/model/out/sft/llama2-Chinese-7b-Chat-qlore-20231117/model-merger 是扩展的32k模型
/mnt/nvme1n1/zhang/venv/small/bin/python inference-qlora.py --base_model /mnt/nvme1n1/zhang/model/out/sft/llama2-Chinese-7b-Chat-qlore-20231117/model-merger --question "Why doesn't Professor Snape seem to like Harry?" --context_size 32768 --max_gen_len 512 --flash_attn True --material "materials/test.txt"

然后报错:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.00 GiB. GPU 5 has a total capacty of 79.19 GiB of which 2.37 GiB is free. Process 43480 has 48.85 GiB memory in use. Including non-PyTorch memory, this process has 27.97 GiB memory in use. Of the allocated memory 22.45 GiB is allocated by PyTorch, and 4.22 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问怎么处理?
还有一个问题:下载chatglm3-6b-32k,输入上下文2万中文,也没有报内存不足。请问这是什么原因?

你好,请问您用的是什么类型的GPU,有没有使用flash-attention?

你好,请问您用的是什么类型的GPU,有没有使用flash-attention?

嗯嗯,这个问题解决了

你好,请问您用的是什么类型的GPU,有没有使用flash-attention?

嗯嗯,这个问题解决了

你好,请问您用的是什么类型的GPU,有没有使用flash-attention?

嗯嗯,这个问题解决了

你好,请问是怎么解决的呢?