standing-o/Machine_Learning_Paper_Review

Opportunities and challenges in explainable artificial intelligence (XAI): A survey

Opened this issue · 0 comments

Opportunities and challenges in explainable artificial intelligence (XAI): A survey

Abstract

  • The black-box nature of deep neural networks challenges its use in mission critical applications, raising ethical and judicial concerns inducing lack of trust.
  • Explainable Artificial Intelligence (XAI) is a field of Artificial Intelligence (AI) that promotes a set of tools, techniques, and algorithms that can generate high-quality interpretable, intuitive, human-understandable explanations of AI decisions.
  • Our survey is based on published research, from the year 2007 to 2020.

Introduction

  • The large number of parameters in DNNs make them complex to understand and undeniably harder to interpret. Regardless of the cross-validation accuracy or other evaluation parameters.
  • DL models could inherently learn or fail to learn representations from the data which a human might consider important.
  • Explaining the decisions made by DNNs require knowledge of the internal operations of DNNs, missing with non-AI-experts and end-users who are more focused on getting accurate solution.
  • Explanations should make the AI algorithm expressive to improve human understanding, confidence in decision making, and promote impartial and just decisions.
    ➔ transparency, trust, and fairness
  • A collection of AI models, such as decision-trees and rulebased models, is inherently interpretable.
    ➔ However, there are affected by the drawbacks of Interpretability-versus-Accuracy trade-off compared to the Deep Learning models.

Taxonomies and organization

  • General categorization of the survey in terms of scope, methodology, and usage.

Definitions and Preliminaries

  • General concept of explainable AI : suite of techniques and algorithms designed to improve the trustworthiness and transparency of AI systems.
    ➔ Most model inference scenarios involve this method where model f is considered as a blob of information which takes an input x and generates an output ŷ.
  • Def. 1 : Interpretability is a desirable quality or feature of an algorithm which provides enough expressive data to understand how the algorithm works.
  • Def. 2 : Interpretation is a simplified representation of a complex domain, such as outputs generated by a machine learning model, to meaningful concepts which are humanunderstandable and reasonable.
  • Def. 3 : An explanation is additional meta information, generated by an external algorithm or by the machine learning model itself, to describe the feature importance or relevance of an input instance towards a particular output classification.
    ➔ g is an object of same shape as the input which describes the feature importance or relevance of that particular dimension
    to the class output.
  • Def. 4 : If the model parameters θ and the model architecture information are known, the model is considered a white-box.
  • Def. 5 : A deep learning model f is considered a black-box if the model parameters and network architectures are hidden from the end-user.
  • Def. 6 : A deep learning model is considered transparent if it is expressive enough to be humanunderstandable.
  • Def. 7 : Trustability of deep learning models is a measure of confidence,
  • Def. 8 : Bias in deep learning algorithms indicate the disproportionate weight, prejudice, favor, or inclination of the learnt model towards subsets of data due to both inherent biases in human data collection and deficiencies in the learning algorithm.
  • Def. 9 : Fairness in deep learning is the quality of a learnt model in providing impartial and just decisions without favoring any populations in the input data distribution.

Why Is Research on XAI Important?

  • Most important concerns are three-fold: 1) trustability, 2) transparency, and 3) bias and fairness of AI algorithms.
    1. Improves Trust
    ➔ A scientific explanation or logical reasoning for a sub-optimal decision is better than a highly confident decision without any explanations.
    2. Improves Transparency
    ➔ Creating a human-understandable justification to the decisions and could find and deter adversarial examples.
    ➔ Transparency is important to assess the quality of output predictions and to ward off adversaries.

    • An image of a Panda is predicted as a Gibbon with high confidence after the original Panda image was tampered by adding some adversarial noise.
      ➔ The attacked image is visually similar to the original image and humans are unable to understand any changes.

    3. Improves Model Bias Understanding and Fairness
    XAI promotes fairness and helps mitigate biases introduced to the AI decision either from input datasets or poor neural network architecture.
    XAI techniques could be used as a way to improve the expressiveness and generate meaningful explanations to feature correlations for many subspaces in the data distribution to understand fairness in AI. By tracing back the output prediction discriminations back to the input using XAI techniques, we can understand the subset of features correlated to particular class-wise decisions.

Scope for explanation

Local Explanations

  • Locally explainable methods focus on a single input data instance to generate explanations by utilizing the different data features.
  • Heatmaps, rulebased methods, Bayesian techniques, and feature importance matrices
    ➔ attribution maps, graph-based, and game-theory based models

1. Activation Maximization

  • Interpreting a layer-wise feature importance of a CNN model is simpler in the first layer which generally learns the highlevel textures and edges. However, as we move deeper into the CNN, importance of specific layers towards a particular prediction is hard to summarize and visualize since parameters of subsequent layers are influenced by that of the previous layers.
    ➔ Focus on input patterns which maximize a given hidden unit activation.
  • Activation map :

    ➔ After the optimization converges, we could either find an average of all local minima’s to find an explanation map g or pick the one which maximizes the activations.
    ➔ Goal is to minimize the activation maximization loss by finding larger filter activations correlated to specific input patterns

2. Saliency Map Visualization

  • Computing the gradient of the output class category with respect to an input image. By visualizing the gradients, a fair summary of pixel importance can be achieved by studying the positive gradients which had more influence to the output.
  • Techniques of visualization : 1) class model visualizations and 2) image specific class visualizations
    ➔ Image-specific class saliency maps using gradient based attribution method

3. Layer-wise Relevance BackPropagation (LRP)

  • Find relevance scores for individual features in the input data by decomposing the output predictions of the DNN
  • The propagation follows a strict conservation property whereby a equal redistribution of relevance received by a neuron must be enforced.
    ➔ Simple NN :

    ➔ R(zj) : the relevance of activation output, the goal is to get :

    ➔ Final relevance score of individual input x :

4. Local Interpretable Model-Agnostic Explanations (LIME)

  • To derive a representation that is understandable by humans, LIME tries to find importance of contiguous superpixels (a patch of pixels) in a source image towards the output class. Hence, LIME finds a binary vector x'∈ {0, 1} to represent the presence or absence of a continuous path or ’superpixel’ that provides the highest representation towards class output.
  • g ∈ G : the explanation as a model from a class of potentially interpretable models G. Explanation complexity is measured by Ω(g). πx(z) : a proximity measure between two instances x and z around x, L(f, g, πx) : faithfulness of g in approximating f in locality defined by πx.
    • Explanation ξ for the input data sample x is given by the LIME equation:
  • We can make predictions on new ‘fake’ data using the complex model f. This depends on the amount of superpixels you choose from the original data. The most descriptive feature can be picked which improved prediction on the permuted data.

5. SHapley Additive exPlanations (SHAP)

  • A game theoretically optimal solution using Shapley values for model explainability
  • SHAP explains predictions of an input x by computing individual feature contributions towards that output prediction.
  • A data feature can be individual categories in tabular data or superpixel groups in images similar to LIME. SHAP then deduce the problem as a set of linear function of functions where the explanation is a linear function of features.
    • φj ∈ R the feature attribution for feature j, g(z) is the sum of bias and individual feature contributions s.t. :

Global Explanation

  • Various globally explainable methods deduce the complex deep models to linear counterparts which are easier to interpret. Rule-based and tree-based models such as decision trees are inherently globally interpretable.
  • Globally explainable methods work on an array of inputs to summarize the overall behavior of the blackbox model.
  • Explanation gf describes the feature attributions of the model as a whole and not just for individual inputs.
    ➔ Understand the general behavior of the model f on large distributions of input and previously unseen data

1. Global Surrogate Models

  • A way to approximate the predictions of highly non-linear AI models with an interpretable linear model or a decision tree
  • “How generalized is my AI model?”, “How do variations of my AI model perform?”
  • A general use case of surrogate models in deep learning would be extraction of feature-rich layer embeddings for test inputs and training a linear classifier on the embeddings. The coefficients of the linear model could give insights to how the model behaves.
  • SHAP, LIME

2. Class Model Visualization

  • Trained ConvNet f and a class of interest c, the goal is to generate image visualizations which is representative of c.
    ➔ This is based on the scoring methods used to train f which maximizes the class probability score Sc(I) for c, s.t. :

    ➔ The generated images provides insight to what the blackbox model had learnt for a particular class in the dataset.
    ➔ Numerically computed images uses the class-model visualization method to generate images representing the target class

3. LIME Algorithm for Global Explanations

  • LIME provides a global understanding of the model from the individual data instances by providing a non redundant global decision boundary of the machine learning model.

4. Concept Activation Vectors (CAVs)

  • To interpret the internal states of a neural network in human-friendly concept domain.

  • Layer activations for layer j of f, zj is calculated for both positive and negative concepts. The set of activations are trained using a binary classifier to distinguish between positive concepts PC and negative concepts N.

  • Testing with CAVs (TCAV)
    ➔ Uses directional derivatives similar to gradient based methods to evaluate the sensitivity of class predictions of f to the changes in given inputs towards the direction of the concept C for a specific layer j.
    ➔ TCAV process : (a) describe random concepts and examples, (b) labelled examples from training data, (c) trained neural network, (d) linear model segregating the activations extracted from specific layers in the neural network for the concepts and random examples, and (e) finding conceptual sensitivity using directional derivatives.

5. Spectral Relevance Analysis (SpRAy)

  • Spectral clustering algorithm on local explanations provided by LRP to understand the decision-making process of the model globally. By analyzing the spatial structure of frequently occurring attributions in LRP instances, SpRAy identifies normal and abnormal behavior of machine learning models.

6. Global Attribution Mapping

  • Global attribution mapping finds a pair-wise rank distance matrix and cluster the attribution by minimizing cost function of cluster distances.

7. Neural Additive Models (NAMs)

  • Train multiple deep neural networks in an additive fashion such that each neural network attend to a single input feature.
  • Use deep learning based neural networks to learn non-linear patterns and feature jumping which traditional tree-based GAMs cannot learn.
    ➔ Interpretable NAM architecture for binary classification

Differences in the Methodology

  • Focus on the changes or modifications input data and the ones which focus on the model architecture and parameters.

Perturbation-Based

  • Explanations generated by iteratively probing a trained machine learning model with different variations of the inputs generally fall under perturbation based XAI techniques.
  • Perturbations can be on a feature level by replacing certain features by zero or random counterfactual instances, picking one or group of pixels (superpixels) for explanation, blurring, shifting, or masking operations.
  • Methods trying to understand neuronal activities and the impact of individual features to a corresponding class output by any
    input perturbations can be categorized as a group of method.

    ➔ The input image given to a deep learning model is perturbed using various randomized masks. A confidence score is found out for individual masked inputs. A final saliency map is generated using a weighting function

BackPropagation- or Gradient-Based

  • Gradient-based explainability methods utilize the backward pass of information flow in a neural network to understand neuronal influence and relevance of the input x towards the output.

    ➔ Segmentation results by using GradCAM output as a seed

Desiderata of Gradient-based Methods

  • Four desirable axioms that a gradient based method needs to follow for improving gradient-based XAI :
  1. Sensitivity
    ➔ If for every input and baseline that differ in one feature but have different predictions then the differing feature should be given a non-zero attribution.
  2. Implementation invariance
    ➔ Two networks are functionally equivalent if their outputs are equal for all inputs, despite having very different implementation.
    ➔ The attributions are always identical for two functionally equivalent networks.
  3. Completeness
    ➔ Attributions should add up to the difference between output of model function f for the input image and another baseline image.
  4. Linearity

Model usage or implementation level

Model Intrinsic

  • Model intrinsic explainability algorithms. The explainability is baked into f itself such that f is naturally
    explainable.
  1. Trees and Rule-based Models
    ➔ Shallow rule-based models such as decision trees and decision lists are inherently interpretable.
    ➔ LIME, SHAP, BRL
  2. Generalized additive models (GAMs)
    ➔ For certain models, GAMs require often millions of decision trees to provide accurate results using the additive algorithms
  3. Sparse LDA and Discriminant Analysis
    ➔ Sparse Penalized Discriminant Analysis (SPDA), Magnetic Resonance Imaging (FMRI)

Post-Hoc

  • Explaining pre-trained classifier decisions require algorithms to look at AI models as black or white boxes. A black box means the XAI algorithm doesn’t know the internal operations and model architectures. In white box XAI, algorithms have access to the model architecture and layer structures.
  • Most post-hoc XAI algorithms are hence model-agnostic s.t. the XAI algorithm will work on any network architectures.
  • For example, an already trained well established neural network decision can be explained without sacrificing the accuracy of the trained model.
    ➔ Post-Hoc model explainability algorithms. The explainability algorithm is applied on f such that f is made
    explainable externally.

Evaluation methodologies, issues, and future directions

Evaluation schemes

  • System Causability Scale (SCS), Benchmarking Attribution Methods (BAM), Faithfulness and Monotonicity, Human-grounded Evaluation Benchmark

Software Packages

A Case-study on Understanding Explanation Maps

  • LIME and SHAP uses segmented superpixels to understand feature importance, while gradient based saliency maps, Integrated Gradients, LRP, DeepLIFT, and Grad-CAM use backpropagation based feature importance in a pixel level.

Limitations of XAI Visualizations and Future Directions

  • Flaws of XAI visualizations and interpretability techniques
  1. the inability of human-attention to deduce XAI explanation maps for decisionmaking
  2. unavailability of a quantitative measure of completeness and correctness of the explanation map.
  • Flaws in current gradient-based techniques.
    ➔ Adversarial attacks involving small perturbations to input layer of neural network.
    ➔ Small perturbations doesn’t affect the accuracy of predictions. However, feature importance maps are highly affected by the small changes.

To study deeper...

  • A timeline of seminal works towards explainable AI algorithms