Fine-grained Interpretation and Causation Analysis in Deep NLP Models

This repository contains material related to the tutorial presented on June 6th, 2021 at NAACL. The presenters were Hassan Sajjad, Narine Kokhlikyan, Fahim Dalvi and Nadir Durrani.

Slides

The PDF version of the slides are available here.

Video

The video recording of the tutorial can be seen here.

Abstract

Deep neural networks have constantly pushed the state-of-the-art performance in natural language processing and are considered as the de-facto modeling approach in solving complex NLP tasks such as machine translation, summarization and question-answering. Despite the proven efficacy of deep neural networks at-large, their opaqueness is a major cause of concern.

In this tutorial, we will present research work on interpreting fine-grained components of a neural network model from two perspectives, i) fine-grained analysis, and ii) causation analysis. The former is a class of methods to analyze neurons with respect to a desired language concept or a task. The latter studies the role of neurons and input features in explaining the decisions made by the model. We will also discuss how fine-grained interpretation and causation analysis can connect towards better interpretability of model prediction. Finally, we will walk you through various toolkits that facilitate fine-grained interpretation and causation analysis of neural models.

Paper

The overview of the tutorial paper can be found here: link

Citation

If you would like to cite our tutorial, you can use the following citation

@inproceedings{sajjad2021interpretiontutorial,
  title={Fine-grained Interpretation and Causation Analysis in Deep {NLP} Models},
  author={Hassan Sajjad and Narine Kokhlikyan and Fahim Dalvi and Nadir Durrani},
  booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials},
  month = jun,
  year = {2021},
  address = {Online},
}