Table of Contents
BEExAI is an open-source tool to quantitatively evaluate, compare and benchmark post-hoc explainable AI - features attribution - methods on multiples Machine Learning models with an end-to-end pipeline.
The project is entirely made in Python 3.9 and tested on Windows 11 64 bits. Both CPU and GPU are supported with PyTorch 2.0.1.
BEExAI can be installed from PyPI with:
pip install beexai
You can also install the project from source using:
- Clone the repo
git clone https://github.com/SquareResearchCenter-AI/BEExAI.git
- Install the requirements
cd BEExAI pip install -r requirements.txt
To train a model, compute explaination attributions and evaluation metrics on tabular data, you will need to specify a config file for each dataset. There are several examples in config/
with the following format:
path: "data/my_dataset.csv"
target_col: "class"
datetime_cols:
- "date"
cols_to_delete:
- "ID"
cleaned_data_path: "output/data/my_dataset_cleaned.csv"
task: "classification"
The different options can be described as follow:
- path: path of the dataset, it can be usually placed in a folder
data/
- target_col: target column for training
- datetime_cols: columns with a datetime format that will be divided in several integer columns (year,month,day,hour)
- cols_to_delete: columns to drop (for example ID columns)
- cleaned_data_path: path to save the dataset after preprocessing for repeated usage, usually in
output/data
- task: classification or regression
Other operations such as adding specific colums from columns operations or deleting specific values must be done during the instanciation of the dataset in the notebooks or scripts.
Several notebooks are available in notebooks/
for simple use cases:
- The numeroted serie can be ran in the order with your own dataset or with the examples provided (kickstarter and boston-credit dataset).
all_in_one.ipynb
synthesizes the 3 notebooks in a single one without the detailed explanations.
from beexai.dataset.load_data import load_data
from beexai.dataset.dataset import Dataset
from beexai.training.train import Trainer
DATA_NAME = "configname"
MODEL_NAME = "NeuralNetwork"
CONFIG_PATH = f"config/{DATA_NAME}.yml"
df,target_col,task,_ = load_data(from_cleaned=False,config_path=CONFIG_PATH)
data = Dataset(df,target_col)
X_train, X_test, y_train, y_test = data.get_train_test()
NN_PARAMS = {"input_dim":X_train.shape[1],"output_dim":num_labels}
trainer = Trainer(MODEL_NAME,task,NN_PARAMS)
trainer.train(X_train, y_train)
from beexai.explaining import CaptumExplainer
from beexai.metrics.get_results import get_all_metrics
METHOD = "IntegratedGradients"
exp = CaptumExplainer(trainer.model,task=task,method=METHOD,sklearn=False)
exp.init_explainer()
LABEL=0
get_all_metrics(X_test.values,LABEL,trainer.model,exp)
For more examples, please refer to the Documentation
The datasets used in this benchmarks are issued from several openml suites.
The ones from Why do tree-based models still outperform deep learning on typical tabular data? are the suites with ID 297,298,299 and 304.
The ones for multiclass classification are from tasks 12,14,16,18,22,23,28 and 32.
A simplified script to download them with OpenML API and create their configuration files is available in the root folder.
python openml_download.py
Running benchmarks can be done with the script benchmetrics.py
with multiple arguments:
python benchmetrics.py --config_path config_folder --save_path output/my_benchmark --seed 42 --n_sample 1000
For comparison with the benchmarks in the benchmark_results
folder, we used 1000 samples from the test set.
- benchmark_results: Complete benchmark results from our paper
insert_link
averaged on 5 random seeds - config: Please detail here some basic information on your data. Other more complex operations on your data need to be done directly in the notebooks or scripts
- data: boston and kickstarter datasets from Kaggle
- notebooks: Simple use cases in notebook format
- output: Store outputs such as cleaned datasets, saved models and computed attributions
- src: Python scripts with main classes
- Linear Regression, Logistic Regression
- Random Forest
- Decision Tree
- Gradient Boosting
- XGBoost
- Dense Neural Network
- Perturbation based: FeatureAblation, Lime, ShapleyValueSampling, KernelShap
- Gradient based: Integrated Gradients, Saliency, DeepLift, InputXGradient
- Robustness: Sensitivity
- Faithfulness: Infidelity, Comprehensiveness, Sufficiency, Faithfulness Correlation, AUC-TP, Monotonicity
- Complexity: Complexity, Sparseness
The proposed pipeline might not include all possible customizations (especially for data preprocessing), feel free to add your own processing within the example notebooks.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
BEExAI is open-sourced with BSD-3 license.
If you find BEExAI useful, please cite our work:
@inproceedings{sithakoul2024beexai, title={BEExAI: Benchmark to Evaluate Explainable AI}, author={Sithakoul, Samuel and Meftah, Sara and Feutry, Cl{'e}ment}, booktitle={World Conference on Explainable Artificial Intelligence}, pages={445--468}, year={2024}, organization={Springer} }
Project Link: https://github.com/SquareResearchCenter-AI/BEExAI