ndif-team/nnsight

Dispatch Error When Using Quantisation

Opened this issue · 1 comments

Description

I am witnessing a dispatch error when using 4bit quantised model. First, note that this is happening when instancitating a LanguageModel from an already existing transformer model in 4bit. Also, note that the 4bit weights should only lie on GPU, and can't go on CPU.

Working Example

from nnsight import LanguageModel

nnsight_model = LanguageModel("gpt2", device_map="auto", load_in_4bit=True)
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()

Failing Example

from nnsight import LanguageModel
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2", device_map="auto", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
nnsight_model = LanguageModel(model, tokenizer=tokenizer)
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()

Info

  • nnsight 0.2.11
  • torch 2.2.1+cu121
  • transformer 4.38.2
  • accelerate 0.29.1
  • bitsandbytes 0.43.0

The Error

The error can be found in this illustrative notebook: https://colab.research.google.com/drive/1n9A7MF8JE2lf26e9gOXRi2HaDjl4DjgX?usp=sharing

[Edit]

In fact, the first method only works with the first call, e.g., the following code fails:

from nnsight import LanguageModel

nnsight_model = LanguageModel("gpt2", device_map="auto", load_in_4bit=True)
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()
with nnsight_model.trace('The Eiffel Tower is in the city of') as tracer:
    hidden_states = nnsight_model.transformer.h[0].mlp.act.output[0].clone().save()