jessevig/bertviz

Cannot visualize enough input length on T5

chen-yifu opened this issue ยท 11 comments

Hi,

Thank you for this fascinating work.

I tried to visualize T5 attentions on a high-Ram Colab Notebook with TPU. It runs perfectly when the input is short. However, when the input length is more than a few sentences, Colab notebook seems to crash.
It's required in my research project that at most several paragraphs be visualized. Do you know if there is a way to make this work?

Thank you!
Yifu (Charles)

Hi @chen-yifu, here are a couple of possible workarounds for the scaling issue:

  1. Try running locally on a Jupyter notebook though it sounds like that might not be possible in your case, and I'm not sure it would make enough of a difference.
  2. Pare down the effective size of the model prior to calling bertviz by removing the attention weights from all heads except for some small subset of heads (e.g. one head in a particular layer). You can find a description of the format of the attention in https://github.com/jessevig/bertviz/blob/master/bertviz/head_view.py. Let me know if you have any questions about this option. Of course the disadvantage is that it will throw off the numbering of layers and heads in the visualization.
    Thanks,
    Jesse

Hi @jessevig !
Thank you for suggesting the workaround. It was helpful!
When I visualize just one layer, it successfully completed. One small issue is that the cell height does not display the full corpus (which is a few sentences long)... Do you know if I can modify the code anywhere to fix this?
Thank you so much again!
Yifu (Charles)

image

(note: the data shown in screenshot is fake)

Ah okay, I see what the issue is. As a workaround you could change the following line:

config.divHeight = config.numLayers * config.thumbnailHeight;

to something like:

config.divHeight = 10000

However, you will have to install bertviz from source as described here: https://github.com/jessevig/bertviz#installing-from-source . Though these instructions apply to a local install.

Anyways, please let me know if that works for you. Thanks.

A similar issue when working with BigBird model. For it to operate properly through its block sparse attention mechanism, it needs to have a huge input length - else it is just the same as BERT.

I made some minor adjustments in the model_view.js as you mentioned (decided to show half of the layers and attentions) but any other way to use any lightweight JS libraries (I have no experience in JS).

Hi, the workaround (config.divHeight = 10000) has worked! Thanks for the tip!

Okay, I've released a new version (1.2.0) where you can display a subset of layers/heads in the model view, e.g.:

model_view(attention, tokens, include_layers=[5, 6], include_heads=[2])

I've also fixed that bug when displaying one or a few layers so that the expanded view renders correctly. Please let me know if you have any other issues!

Thanks, it's very helpful!

Good to hear! Also please let me know if you have any other suggestions related to this feature.

@chen-yifu could you please share with us your notebook for visualizing T5 attention?
I'm trying to solve the same task but I'm not sure about how to proceede.

Thanks a lot!

Hi @mciniselli, you should be able to use this code for encoder-decoder models: https://github.com/jessevig/bertviz#encoder-decoder-models-bart-marianmt-etc

Just substitute "t5-small" for "Helsinki-NLP/opus-mt-en-de" in both places.

After the visualization appears you should see a drop-down in the upper left corner allowing you to pick among the 3 forms of attention for this model.

Please let me know if you have any issues.

Thank you very much for your help!