Transformer visualization via dictionary learning

This repo contains the code for paper: Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors by Zeyu Yun*, Yubei Chen*, Bruno A Olshausen, and Yann LeCun (DeeLIO Workshop@NAACL 2021).



The Demo is here: Demo


To visualize the hidden states for transformer factor. We need to first train a dictionary and then infer the sparse code using these dictionary.

To train a dictionary,

run python

If you want to use your own data, you need to put it in a python list, where each element is a string (sentences). Then save this list as a .npy file, then run

python --training_data ./your_data.npy

To infer the sparse code and save the top activated examples for each transformer factors, run

python --dictionary_dir ./the_path_for_your_trained_dictionary

(Optional) To use LIME to generate the attribution (color) map, run

python --dictionary_dir ./the_path_for_your_trained_dictionary --example_dir ./the_path_of_your_top_activated_examples


If you find this repo useful, please consider to cite our work:

    title={Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors},
    author={Yun, Zeyu and Chen, Yubei and Olshausen, Bruno A and LeCun, Yann},
    booktitle = "Proceedings of Deep Learning Inside Out (DeeLIO) NAACL: The Second Workshop on Knowledge Extraction and Integration for Deep Learning Architectures",
    year = "2021",
    publisher = "Association for Computational Linguistics",