How much VRAM do you need to run Inference?
FFFiend opened this issue · 0 comments
FFFiend commented
Hey there, I'd just like to fork your repo and test it out on a couple prompts (which I'm assuming I can do just by running the pipelines set up in run_test.py
? If I'm wrong about this could you clarify) but I'm wondering how much VRAM I need to do this? The weights themselves are around 27 GB and the LLaMa model is 13 so can I load everything onto a single 40 GB GPU?
Thanks.