Concept Attribution: Explaining CNN decisions to physicians

This repository contains the main code and link to the datasets necessary to replicate the experiments in the paper "Concept Attribution: Explaining CNN decisions to physicians" published in Computers in Biology and Medicine, Volume 123, August 2020, 103865

Highlights

Feature attribution explains CNNs in terms of the input pixels.

The abstraction of feature attribution to higher level impacting factors is hard.

Concept attribution explains CNNs with high-level concepts such as clinical factors.

Nuclei pleomorphism is shown as a relevant factor in breast tumor classification.

Concept attribution can match clinical expectations to the interpretability of CNNs.

Datasets

Three of the four datasets used for the experiments are publicly available and can be downloaded at the following links:

http://yann.lecun.com/exdb/mnist/

https://camelyon17.grand-challenge.org/Data/

https://nucleisegmentationbenchmark.weebly.com/dataset.html

Regression Concept Vectors: RCV-tool library

With this library you will be able to apply concept attribution to your task. The main steps are:

Extraction of concept measures
Finding the vector representing the concept in the activation space
Generating concept-based explanations

1. Extract basic concepts

Color and texture measures can be extracted from the images in your data to be represented as concepts. See the functions:

get_color_measure(image, mask=None, type=None, verbose=True)

get_texture_measure(image, mask=None, type=None, verbose=True)

2. Find the concept vectors

We compute RCVs by least squares linear regression ofthe concept measures for a set of inputs. The concept vector (RCV) represents the direction of greatest increase of the measures for a single continuous concept. Different parameters can be specified to compute the regression:

compute linear regression
compute ridge regression
compute local linear regression -- not yet supported

See the functions:

get_activations(model, layer, data, labels=None, pooling=None, param_update=False, save_fold='')

linear_regression(acts, measures, type='linear', evaluation=False, verbose=True)

The regression is evaluated in different ways:

on training or held-out data, with rsquared, mse and adjusted rsquared
by evaluating angle between to rcvs