Integrated Gradients

(a.k.a. Path-Integrated Gradients, a.k.a. Axiomatic Attribution for Deep Networks)

Contact: integrated-gradients AT gmail.com

Contributors (alphabetical, last name):

Kedar Dhamdhere (Google)
Pramod Kaushik Mudrakarta (U. Chicago)
Mukund Sundararajan (Google)
Ankur Taly (Google Brain)
Jinhua (Shawn) Xu (Verily)

We study the problem of attributing the prediction of a deep network to its input features, as an attempt towards explaining individual predictions. For instance, in an object recognition network, an attribution method could tell us which pixels of the image were responsible for a certain label being picked, or which words from sentence were indicative of strong sentiment.

Applications range from helping a developer debug, allowing analysts to explore the logic of a network, and to give end-user’s some transparency into the reason for a network’s prediction.

Integrated Gradients is a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).

Relevant papers and slide decks

Axiomatic Attribution for Deep Networks -- Mukund Sundararajan, Ankur Taly, Qiqi Yan, Proceedings of International Conference on Machine Learning (ICML), 2017

This paper introduced the Integrated Gradients method. It presents an axiomatic justification of the method along with applications to various deep networks. Slide deck
Did the model understand the questions? -- Pramod Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere, Proceedings of Association of Computational Linguistics (ACL), 2018

This paper discusses an application of integrated gradients for evaluating the robustness of question-answering networks. Slide deck

Implementing Integrated Gradients

This How-To document describes the steps involved in implementing integrated gradients for an arbitrary deep network.

This repository provideds code for implementing integrated gradients for networks with image inputs. It is structured as follows:

Integrated Gradients library: Library implementing the core integrated gradients algorithm.
Visualization library: Library implementing methods for visualizing atributions for image models.
Inception notebook: A Jupyter notebook for generating and visualizing atributions for the Inception (v1) object recognition network.

We recommend starting with the notebook. To run the notebook, please follow the following instructions.

Clone this repository

git clone https://github.com/ankurtaly/Integrated-Gradients.git

In the same directory, run the Jupyter notebook server.
```
jupyter notebook
```
Instructions for installing Jupyter are available here. Please make sure that you have TensorFlow, NumPy, and PIL.Image installed for Python 2.7.
Open ig_inception.ipynb and run all cells.