abetlen/llama-cpp-python

Specify GPU Selection (e.g., CUDA:0, CUDA:1)

Opened this issue · 4 comments

Hi,

Is there a way to specify which GPU to use for inference, such as restricting it to only cuda:0 or cuda:1 in the code? Or are there any workarounds for achieving this?

Thanks in advance.

You can use tensor split [1,0,0] to ignore cuda 1 and 2 and keep on 0.

Also use split mode none to increase perf if it stays on only one gpu

Hi @ExtReMLapin Thanks for your reply!
I tried the way you said but got stuck so can you please elaborate in more detail way

It’s the Llama class arguments