a questuon about the single GPU Inference

Question

a questuon about the single GPU Inference

TitleZ99 opened this issue 2 years ago · 1 comments

Thanks for this great job and i'm wondering how to run inference in a 8GB single GPU,like your example showing in the readme. I tried it in my RTX2080ti with 11GB and the result is CUDA out of memory.

Answer 1 · 2023-04-10T08:13:07.000Z

Same problem, a single gpu, in the case of no quantization, it should be the need for 4*7B=28GB memory