I have gpu and I expect to run model faster, but your code is only for cpu? how to change it?

Question

I have gpu and I expect to run model faster, but your code is only for cpu? how to change it?

alexhmyang opened this issue 2 years ago · 2 comments

alexhmyang commented 2 years ago

as for the quesiton:
Requirements
GPU is not used and is not required.

I have gpu and I expect to run model faster, but your code is only for cpu? how to change it?

Answer 1 · 2023-06-07T04:19:41.000Z

Until I make some updates, check out this fork
https://github.com/sebaxzero/LangChain_PDFChat_Oobabooga

I haven't tried it myself, but looks like it should be capable to utilize GPU.

Answer 2 · 2023-06-11T14:56:44.000Z

You could try this approach!
Haven't tested it yet though

GPU Based Approach

llm = LlamaCpp(
    model_path="./models/stable-vicuna-13B.ggmlv3.q8_0.bin",
    stop=["### Human:"],
    callback_manager=callback_manager,
    verbose=True,
    n_ctx=2048,
    n_batch=512,
    device="cuda"  # Specify the GPU device (e.g., "cuda:0")
)