Eval_XAI_Robustness

This is the repository for paper "SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability". It proposes two evaluation metrics for robustness of interpretation from worst-case and probabilistic perspective, resepectively. The popular XAI methods, such as Integreted Gradient, LRP, and DeepLift are supported for the evaluation.

Environment Setup

Requires Linux Platform with Python 3.8.5. We recommend to use anaconda for creating virtual environment. requirements.txt file contains the python packages required for running the code. Follow below steps for installing the packages:

Create virtual environment and install necessary packages

conda create -n eval_xai --file requirements.txt
Activate virtual environment

conda activate eval_xai

Files

model Directory contains scripts for training test models
checkpoints Directory contains saved checkpoints for pre_trained test models

Note:

We only include a pre-trained test model for MNIST dataset due to the file size limit. For other dataset, please train the test models first.
You may get error 'zipfile.BadZipFile: File is not a zip file' when downloading CelebA dataset. Google Drive has a daily maximum quota for any file. Try to mannually download from here and unzip the dataset. Move to the folder Datasets/celeba

How To Use

The tool can be run for XAI robustness evaluation, test model training with the following commands.

Quick Start

You can quickly run the worst-case robustness evaluation on intrepation by Gradient x Input, using Genetic Algorithm as optimizier:

python main.py --eval_metric ga

or probabilistic robustness evaluation on intrepation by Gradient x Input, using Subset Simulation as sampling method: