ndif-team/nnsight

Allow using models that rely on keyword arguments and not positional parameters, like Mistral

Closed this issue · 2 comments

Currently, attribution patching using nnsight works for models that use positional parameters such as GPT-2, but not for Mistral which uses explicit keyword arguments. This renders nnsight unable to retrieve input activations for various layers and attention heads for the Mistral model, i.e input activations for layers/attention heads are empty. Similarly, nnsight cannot retrieve the lm_head outputs as well. A test case for a PR that fixes this issue could be to get the Attribution Patching tutorial working for Mistral.

This will likely need fixes to LanguageModel to include models like Mistral.

cc @JadenFiotto-Kaufman not sure if I followed issue creation guidelines for nnsight, but happy to elaborate more! :) Thanks for your work on this awesome package!

@arunasank Appreciate the issue! Should be all set now. .input now returns a tuple of length two where the first index is the positional arguments and the second is the key word arguments