/araucana-xai

Tree-based local explanations of machine learning model predictions

Primary LanguageJupyter NotebookMIT LicenseMIT

Contributors Forks Stargazers Issues MIT License


Araucana XAI

Tree-based local explanations of machine learning model predictions

Status

Repository for the araucanaxai package. Implementation of the pipeline first described in Parimbelli et al., 2023.
Explore the docs »

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Installation
  3. Usage
  4. Publications
  5. Contacts And Useful Links
  6. License

About The Project

Increasingly complex learning methods such as boosting, bagging and deep learning have made ML models more accurate, but harder to understand and interpret. A tradeoff between performance and intelligibility is often to be faced, especially in high-stakes applications like medicine. This project proposes a novel methodological approach for generating explanations of the predictions of a generic ML model, given a specific instance for which the prediction has been made, that can tackle both classification and regression tasks. Advantages of the proposed XAI approach include improved fidelity to the original model, the ability to deal with non-linear decision boundaries, and native support to both classification and regression problems.

Keywords: explainable AI, explanations, local explanation, fidelity, interpretability, transparency, trustworthy AI, black-box, machine learning, feature importance, decision tree, CART, AIM.

↰ Back To Top

Installation

  1. Make sure you have the latest version of pip installed
    pip install --upgrade pip
  2. Install araucanaxai through pip
    pip install araucanaxai

↰ Back To Top

Usage

Here's a basic example with a built-in toy dataset that illustrates Araucana XAI common usage.

First, train a classifier on the data. Araucana XAI is model-agnostic, you only have to provide a function that takes data as input and outputs binary labels.

Then, declare the example whose classification you want to explain.

Finally, run the Araucana XAI and plot the xai tree to explain model's decision as a set of IF-ELSE rules.

import araucanaxai
from sklearn.linear_model import LogisticRegression
from sklearn import tree
from sklearn.metrics import *
import matplotlib.pyplot as plt

# load toy dataset with both categorical and numerical features
cat_data = True # set to False if you don't need categorical features 
data = araucanaxai.load_breast_cancer(train_split=.75, cat=cat_data)

# specify which features are categorical
cat = data["feature_names"][0:5]
is_cat = [x in cat for x in data["feature_names"]] # set to None if you don't need categorical data

# train logistic regression classifier: this is the model to explain
classifier = LogisticRegression(random_state=42, solver='liblinear', penalty='l1', max_iter=500)
classifier.fit(data["X_train"], data["y_train"])
y_test_pred = classifier.predict(data["X_test"])

print('precision: ' + str(precision_score(data["y_test"], y_test_pred)) + ', recall: ' + str(
    recall_score(data["y_test"], y_test_pred)))

# declare the instance we want to explain
index = 65
instance = data["X_test"][index, :].reshape(1, data["X_test"].shape[1])
instance_pred_y = y_test_pred[index]

# build xai tree to explain the instance classification
# the neighbourhood size determines the number of closer instances to consider for local explaination
# different oversampling strategies are available for data augmentation: SMOTE, random uniform and random non-uniform (based on sample statistics)
# it is possible to control the xai tree pruning in temrs of maximum depth and minimum number of istances in a leaf
xai_tree = araucanaxai.run(x_target=instance, y_pred_target=instance_pred_y,
                           x_train=data["X_train"],feature_names=data["feature_names"], cat_list=is_cat,
                           neighbourhood_size=150, oversampling=True,
                           oversampling_type="smote", oversampling_size=100,
                           max_depth=3, min_samples_leaf=1,
                           predict_fun=classifier.predict)

# plot the tree
fig, ax = plt.subplots(figsize=(10, 10))
tree.plot_tree(xai_tree['tree'], feature_names=data["feature_names"], filled=True, class_names=data["target_names"])
plt.tight_layout()
plt.show()

You can also check the notebook here.

See the open issues for a full list of proposed features (and known issues).

↰ Back To Top

Publications

List of publications involving AraucanaXAI

  • E Parimbelli, TM Buonocore, G Nicora, W Michalowski, S Wilk, R Bellazzi - Why did AI get this one wrong? Tree-based explanations of machine learning model predictions - Artificial Intelligence in Medicine, Volume 135, 2023 (link)

  • TM Buonocore, G Nicora, A Dagliati, E Parimbelli - Evaluation of XAI on ALS 6-months Mortality Prediction - Proceedings of the Working Notes of CLEF 2022, Volume 3180, 2022 (link)

  • E Parimbelli, G Nicora, S Wilk, W Michalowski, R Bellazzi - Tree-based Local Explanations of Machine Learning Model Predictions - XAI Healthcare workshop, AIME 2021 (link, presentation)

If you use the AraucanaXAI software for your projects, please cite it as:

@software{Buonocore_Araucana_XAI_2022,
  author = {Buonocore, Tommaso Mario and Giovanna, Nicora and Enea, Parimbelli},
  doi = {10.5281/zenodo.1234},
  month = {9},
  title = {{Araucana XAI}},
  url = {https://github.com/detsutut/AraucanaXAI},
  version = {1.0.0},
  year = {2022}
}

↰ Back To Top

Contacts and Useful Links

↰ Back To Top

License

Distributed under MIT License. See LICENSE for more information.

↰ Back To Top