jessevig/bertviz

Support sequence classification (+ gradient visualization?)

pltrdy opened this issue · 1 comments

Hi,

First of all thanks for this work, it's interesting, easy to use and doc is clear.

I'm looking for ways to get insights on how an mBart model is behaving in a sequence classification task (see https://huggingface.co/transformers/model_doc/mbart.html#transformers.MBartForSequenceClassification).

I plan to:

  1. first visualize attention, how could we modify this module to make it possible? (shouldn't be that difficult?)
  2. visualize gradients: i.e. given a text, a target and an expected target, compute the loss and visualize how each input words impacts the loss. Do you have any experience with such an approach? Do you think bertviz could be useful to visualize gradients instead of attention?

Thanks!


Edit: The problem with MBartForSequenceClassification is that the input sequence is passed in the encoder AND in the decoder. I would need the target class as the decoder target in order to evaluate attention between the target class and the input. Currently, the attention between decoder and encoder is somewhat similar to self-attention given they are the same sequences.

Hi @pltrdy, my apologies for the delayed response. Thank you for the great suggestion! For saliency methods like you are describing, I would consider some of the other available tools such as the Language Interpretability Toolkit or Ecco.