Focus! Rating XAI Methods and Finding Biases with Mosaics

We propose in [1] a consistent evaluation metric for feature attribution methods -- the Focus -- designed to quantify their coherency to the task. This repository contains the mosaics and the code needed to replicate the experiments in our paper: Focus! Rating XAI Methods and Finding Biases with Mosaics.

Six explainability methods have been evaluated:

  • Smoothgrad [10]: the implementation used is based on the work of Nam et al.
  • Layer-wise Relevance Propagation (LRP) [2]: the implementation used is based on the work of Nakashima et al.
  • GradCAM [9]: the implementation used is based on the work of Gildenblat et al.
  • LIME [7]: the implementation used is based on the work of Tulio et al.
  • GradCAM++ [3]: the implementation used is based on the work of Gildenblat et al.
  • Integrated Gradients (IG) [11]: the implementation used is based on the work of Kokhlikyan et al. [4].

Requirements

This code runs under Python 3.7.1. The python dependencies are defined in requirements.txt.

Available mosaics

We provide already created mosaics from four different datasets:

How to run the experiments

We already provide the bash scripts needed to calculate the focus of the different settings. Each execution has two steps:

  1. First, the explainability method is applied and the relevance matrices are obtained and save in: $PROJECT_PATH/data/explainability/

  2. Second, the Focus is computed from the relevances obtained in the previous step.

To run both steps execute the following bash scripts:

Step 1

cd $PROJECT_PATH/explainability/scripts/explainability_dataset/

sh explainability_dataset_architecture_method.sh

Step 2

cd $PROJECT_PATH/evaluation/scripts/evaluation_dataset/

sh evaluation_dataset_architecture_method.sh

where:

  • dataset must be exchanged by one of the following options: catsdogs, ilsvrc2012, mit67 or mame.
  • architecture must be exchanged by one of the following options: alexnet, vgg16 or resnet18.
  • method must be exchanged by smoothgrad, lrp, gradcam, lime, gradcampp or intgrad.

For example, to get the Focus of the Dogs vs. Cats dataset, using the ResNet18 architecture and the GradCAM method, run the following:

Step 1

cd $PROJECT_PATH/explainability/scripts/explainability_catsdogs/

sh explainability_catsdogs_resnet18_gradcam.sh

Step 2

cd $PROJECT_PATH/evaluation/scripts/evaluation_dataset/

sh evaluation_catsdogs_resnet18_gradcam.sh

Cite

Please cite our paper when using this code.

@misc{ariasduart2021focus,
      title={Focus! Rating XAI Methods and Finding Biases with Mosaics}, 
      author={Anna Arias-Duart and Ferran Parés and Dario Garcia-Gasulla},
      year={2021},
      eprint={2109.15035},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

References

[1] Arias-Duart, A., Parés, F., & García-Gasulla, D. (2021). Focus! Rating XAI Methods and Finding Biases with Mosaics. arXiv preprint arXiv:2109.15035

[2] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7), e0130140.

[3] Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018, March). Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 839-847). IEEE.

[4] Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., ... & Reblitz-Richardson, O. (2020). Captum: A unified and generic model interpretability library for pytorch. arXiv preprint arXiv:2009.07896.

[5] Parés, F., Arias-Duart, A., Garcia-Gasulla, D., Campo-Francés, G., Viladrich, N., Ayguadé, E., & Labarta, J. (2020). A Closer Look at Art Mediums: The MAMe Image Classification Dataset. arXiv preprint arXiv:2007.13693.

[6] Quattoni, A., & Torralba, A. (2009, June). Recognizing indoor scenes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 413-420). IEEE.

[7] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).

[8] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211-252.

[9] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618-626).

[10] Smilkov, D., Thorat, N., Kim, B., Viégas, F., & Wattenberg, M. (2017). Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825.

[11] Sundararajan, M., Taly, A., & Yan, Q. (2017, July). Axiomatic attribution for deep networks. In International Conference on Machine Learning (pp. 3319-3328). PMLR.