Potential Feature Request - Simple Inference

Question

Potential Feature Request - Simple Inference

Closed this issue 6 months ago · 2 comments

I wonder if others would be interested in a feature that allows for simple inference without a context manager.

Currently, the simplest way to run inference on a language model using nnsight is to use an invoke context. For example:

model = LanguageModel(...)
prompt = "Hello" 
with model.invoke(prompt) as invoker:
    pass
output = invoker.output

An potential interface for something like this might look like output = model.simple_invoke(prompt) or output = model.invoke(prompt, trace=False).

The use case I imagine for this would be if you just want to run inference to quickly see what a model would output on a particular prompt, without needing to access internals.

Thanks for the consideration!

Answer 1 · 2024-02-01T01:55:29.000Z

You should be able to do something similar by calling .local_model.

from nnsight import LanguageModel
import torch as t

model = LanguageModel("gpt2", device_map="auto", dispatch=True)
tokenizer = model.tokenizer

test = t.tensor(tokenizer.encode("test"))
logits = model.local_model(test)

I'll look into implementing a version that automatically tokenizes inputs and works well with a remote framework.

Answer 2 · 2024-02-17T03:14:34.000Z

@ericwtodd In 0.2, you can enter a tracing context with input and set trace=False for no with context block and just get the output directly like

from nnsight import LanguageModel

model = LanguageModel("gpt2", device_map="auto", dispatch=True)


output = model.trace('Hello', trace=False)