Simple code to load models to GPU

Question

Simple code to load models to GPU

Closed this issue 2 months ago · 2 comments

Documentation request

Currently, the following code loads to cpu:

import outlines

model = outlines.models.transformers("/kaggle/input/llama-3.1/transformers/8b-instruct/2")

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome!
"""

generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)

What is the easiest way to tweak this code to load to two GPUs (two T4s, so no flash attention and in float16)?

Are you willing to open a PR?

I would but I don't know how to do this yet. Happy to do so if you can suggest the fix.

Answer 1 · 2024-10-05T21:21:58.000Z

Can you please try outlines.models.transformers("your_model_uri", device="auto") and report back whether this correctly loaded the model to your two GPUs?

Answer 2 · 2024-10-11T15:09:58.000Z

Howdy, vllm started supporting outlines offline, so I'm able to use that

…

On Sat, Oct 5, 2024 at 10:22 PM Andrew Lapp ***@***.***> wrote: Can you please try outlines.models.transformers("your_model_uri", device="auto") and report back whether this correctly loaded the model to your two GPUs? — Reply to this email directly, view it on GitHub <#1177 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASVG6CUQITJ3WZPLIE55NGTZ2BKAXAVCNFSM6AAAAABO7KTUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGIYDAOJSGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>