Dispatch Error When Using Quantisation

Question

Dispatch Error When Using Quantisation

Opened this issue 5 months ago · 1 comments

Description

I am witnessing a dispatch error when using 4bit quantised model. First, note that this is happening when instancitating a LanguageModel from an already existing transformer model in 4bit. Also, note that the 4bit weights should only lie on GPU, and can't go on CPU.

Working Example

from nnsight import LanguageModel

nnsight_model = LanguageModel("gpt2", device_map="auto", load_in_4bit=True)
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()

Failing Example

from nnsight import LanguageModel
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2", device_map="auto", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
nnsight_model = LanguageModel(model, tokenizer=tokenizer)
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()

Info

nnsight 0.2.11
torch 2.2.1+cu121
transformer 4.38.2
accelerate 0.29.1
bitsandbytes 0.43.0

The Error

The error can be found in this illustrative notebook: https://colab.research.google.com/drive/1n9A7MF8JE2lf26e9gOXRi2HaDjl4DjgX?usp=sharing

Answer 1 · 2024-04-07T16:32:14.000Z

[Edit]

In fact, the first method only works with the first call, e.g., the following code fails:

from nnsight import LanguageModel

nnsight_model = LanguageModel("gpt2", device_map="auto", load_in_4bit=True)
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()