Tool for visualizing attention in BERT, GPT-2, XLNet, and RoBERTa. Extends Tensor2Tensor visualization tool by Llion Jones and pytorch-transformers from HuggingFace.
Blog posts:
- Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention
- OpenAI GPT-2: Understanding Language Generation through Visualization
- Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters
Paper:
The attention-head view visualizes the attention patterns produced by one or more attention heads in a given transformer layer.
BERT:
[Notebook]
[Colab]
GPT-2:
[Notebook]
[Colab]
XLNet: [Notebook]
[Colab]
RoBERTa: [Notebook]
[Colab]
The model view provides a birds-eye view of attention across all of the model’s layers and heads.
BERT: [Notebook]
[Colab]
GPT-2
[Notebook]
[Colab]
XLNet: [Notebook]
[Colab]
RoBERTa: [Notebook]
[Colab]
The neuron view visualizes the individual neurons in the query and key vectors and shows how they are used to compute attention.
BERT: [Notebook]
[Colab]
GPT-2
[Notebook]
[Colab]
RoBERTa
[Notebook]
[Colab]
(See requirements.txt)
git clone https://github.com/jessevig/bertviz.git
cd bertviz
jupyter notebook
When referencing BertViz, please cite this paper.
@article{vig2019transformervis,
author = {Jesse Vig},
title = {A Multiscale Visualization of Attention in the Transformer Model},
journal = {arXiv preprint arXiv:1906.05714},
year = {2019},
url = {https://arxiv.org/abs/1906.05714}
}
This project is licensed under the Apache 2.0 License - see the LICENSE file for details
This project incorporates code from the following repos: