TypeError when using CogVLM and pass scan=False

I have come across a weird issue, when I try to use the cogvlm2 model in nnsight but I get the following error:

Here is the code

from transformers import AutoModelForCausalLM, AutoTokenizer
from nnsight import LanguageModel

model_path="THUDM/cogvlm2-llama3-chat-19B"
base_model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True,device_map="auto",cache_dir="cache")
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True,device_map="auto",cache_dir="cache")

if tokenizer.pad_token is None:
    #tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    if "Qwen" in model_path and tokenizer.eos_token is None:
        tokenizer.eos_token='<|endoftext|>'
    tokenizer.pad_token = tokenizer.eos_token
    base_model.resize_token_embeddings(len(tokenizer))

model = LanguageModel(base_model, tokenizer=tokenizer,device_map="auto")

prompt="apple"
all_attention_states = []
with model.trace(prompt,scan=False):
    for layer in model.model.layers:
        all_attention_states.append(layer.self_attn.output[0].save())

Here is the error :

Traceback (most recent call last):
  File "/data/yaoy/abstract/probe_temp.py", line 21, in <module>
    all_attention_states.append(layer.self_attn.output[0].save()) 
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/nnsight/contexts/Runner.py", line 41, in __exit__
    raise exc_val
  File "/data/yaoy/abstract/probe_temp.py", line 21, in <module>
    all_attention_states.append(layer.self_attn.output[0].save()) 
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/nnsight/envoy.py", line 445, in output
    self._output = self._tracer._graph.add(
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/nnsight/tracing/Graph.py", line 150, in add
    value = target(
TypeError: 'str' object is not callable

If I pass scan=True, I would get the following error:

Traceback (most recent call last):
  File "/data/yaoy/abstract/probe_temp.py", line 19, in <module>
    with model.trace(prompt):
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/nnsight/models/NNsightModel.py", line 196, in trace
    runner.invoke(*inputs, **invoker_args).__enter__()
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/nnsight/contexts/Invoker.py", line 69, in __enter__
    self.tracer._model._execute(
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/nnsight/models/mixins/Generation.py", line 21, in _execute
    return self._execute_forward(prepared_inputs, *args, **kwargs)
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/nnsight/models/LanguageModel.py", line 281, in _execute_forward
    return self._model(
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/yaoy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B/2bf7de6892877eb50142395af14847519ba95998/modeling_cogvlm.py", line 649, in forward
    outputs = self.model(
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data/yaoy/anaconda/envs/abstract/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/yaoy/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chat-19B/2bf7de6892877eb50142395af14847519ba95998/modeling_cogvlm.py", line 397, in forward
    assert not (token_type_ids == VISION_TOKEN_TYPE).any(), f"{(token_type_ids == VISION_TOKEN_TYPE).sum()}"
AssertionError: FakeTensor(..., device='cuda:0', size=(), dtype=torch.int64)

I looked into the code and I am a little bit confused. Since the target could be a string:

nnsight/src/nnsight/tracing/Graph.py

Line 113 in 416a6f2

target: Union[Callable, str],

Why would we have the following code that assumes target is callable? 🤔

nnsight/src/nnsight/tracing/Graph.py

Line 150 in 416a6f2

value = target(

Does anyone know how to solve this problem? 🥹

@Zoeyyao27 Okay so it seems in this model there is some assertion:

assert not (token_type_ids == VISION_TOKEN_TYPE).any(), f"{(token_type_ids == VISION_TOKEN_TYPE).sum()}"

Which wont work with fake tensors.

You were doing the right thing to avoid scanning, but you also need to turn of validation (which is like scanning but for the interventions you define in your trace context)

So: with model.trace(prompt,scan=False,validate=False):

@Zoeyyao27 Okay so it seems in this model there is some assertion:

assert not (token_type_ids == VISION_TOKEN_TYPE).any(), f"{(token_type_ids == VISION_TOKEN_TYPE).sum()}"

Which wont work with fake tensors.

You were doing the right thing to avoid scanning, but you also need to turn of validation (which is like scanning but for the interventions you define in your trace context)

So: with model.trace(prompt,scan=False,validate=False):

Thank you for your reply. But if I print:

print(layer.self_attn.output[0].save())

I would get

LanguageModelProxy (getitem_1): <class 'inspect._empty'>

How can I obtain the attention states each layer?

@Zoeyyao27 So only after you exit the tracing context is the model ran and values filled in. So:

with model.trace(..., scan=False, validate=False):

    attn_state = layer.self_attn.output[0].save()
    print(attn_state)

Will print that it is a Proxy for a value, vs:

with model.trace(..., scan=False, validate=False):

    attn_state = layer.self_attn.output[0].save()
print(attn_state)

Will print the real value

Thank you for your reply! It worked!