Simple code to load models to GPU
Closed this issue · 2 comments
RonanKMcGovern commented
Documentation request
Currently, the following code loads to cpu:
import outlines
model = outlines.models.transformers("/kaggle/input/llama-3.1/transformers/8b-instruct/2")
prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?
Review: This restaurant is just awesome!
"""
generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)
What is the easiest way to tweak this code to load to two GPUs (two T4s, so no flash attention and in float16)?
Are you willing to open a PR?
I would but I don't know how to do this yet. Happy to do so if you can suggest the fix.
lapp0 commented
Can you please try outlines.models.transformers("your_model_uri", device="auto")
and report back whether this correctly loaded the model to your two GPUs?
RonanKMcGovern commented
Howdy, vllm started supporting outlines offline, so I'm able to use that
…On Sat, Oct 5, 2024 at 10:22 PM Andrew Lapp ***@***.***> wrote:
Can you please try outlines.models.transformers("your_model_uri",
device="auto") and report back whether this correctly loaded the model to
your two GPUs?
—
Reply to this email directly, view it on GitHub
<#1177 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASVG6CUQITJ3WZPLIE55NGTZ2BKAXAVCNFSM6AAAAABO7KTUH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGIYDAOJSGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>