dottxt-ai/outlines

Simple code to load models to GPU

Closed this issue · 2 comments

Documentation request

Currently, the following code loads to cpu:

import outlines

model = outlines.models.transformers("/kaggle/input/llama-3.1/transformers/8b-instruct/2")

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is just awesome!
"""

generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)

What is the easiest way to tweak this code to load to two GPUs (two T4s, so no flash attention and in float16)?

Are you willing to open a PR?

I would but I don't know how to do this yet. Happy to do so if you can suggest the fix.

Can you please try outlines.models.transformers("your_model_uri", device="auto") and report back whether this correctly loaded the model to your two GPUs?