How much VRAM do you need to run Inference?

Question

How much VRAM do you need to run Inference?

FFFiend opened this issue a year ago · 0 comments

Hey there, I'd just like to fork your repo and test it out on a couple prompts (which I'm assuming I can do just by running the pipelines set up in run_test.py ? If I'm wrong about this could you clarify) but I'm wondering how much VRAM I need to do this? The weights themselves are around 27 GB and the LLaMa model is 13 so can I load everything onto a single 40 GB GPU?

Thanks.