Measuring Feature Importance of Symbolic Regression Models Using Partial Effects

Recently, we studied how the symbolic regression algorithm named Interaction-Transformation Evolutionary Algorithm (ITEA) could benefit from using the Partial Effects to find variable importances and compared the performance with SHAP, LIME, and ELI5.

This repository contains all experiment scripts, data sets, ITEA implementation, and results used for the paper Measuring Feature Importance of Symbolic Regression Models Using Partial Effects, submitted for the 2021 GECCO.

Paper abstract: In explainable AI, one aspect of a prediction’s explanation is to measure each predictor’s importance to the decision process. The importance can measure how much variation a predictor promotes locally or how much the predictor contributes to the deviation from a reference point (Shapley value). If we have the ground truth analytical model, we can calculate the former using the Partial Effect, calculated as the predictor’s partial derivative. Also, we can estimate the latter by calculating the average partial effect multiplied by the difference between the predictor and the reference value. Symbolic Regression is a gray-box model for regression problems that returns an analytical model approximating the input data. Although it is often associated with interpretability, few works explore this property. This paper will investigate the use of Partial Effect with the analytical models generated by the Interaction-Transformation Evolutionary Algorithm symbolic regressor (ITEA). We show that the regression models returned by ITEA coupled with Partial Effect provide the closest explanations to the ground-truth and a close approximation to Shapley values. These results open up new opportunities to explain symbolic regression models compared to the approximations provided by model-agnostic approaches.

Please, cite us as:

@INPROCEEDINGS{MeasuringFeatureImportanceITEA,
    numpages={9},
    year = 2021,
    month = {jul},
    publisher = {{ACM}},
    author = {Guilherme Seidyo Imai Aldeia and Fabrício Olivetti de França},
    title = {Measuring Feature Importance of Symbolic Regression Models Using Partial Effects},
    booktitle = {2021 {ACM} Genetic and Evolutionary Computation Conference ({GECCO})}
}

Folder structure

.
├── datasets
├── docs 
├── results
│   ├── figures
│   ├── tabular_processed
│   └── tabular_raw      
└── src
    ├── analysis
    ├── experiments      
    └── itea

datasets: folder with all data sets used;
docs: simple markdown files documenting some aspects of this repository;
results/figures: pdf figures generated by the analysis scripts;
results/tabular_processed: processed and merged final tables;
results/tabular_raw: raw result files generated from the experiment scripts;
src/analysis: scripts to analyze the data and generate the figures;
src/experiments: scripts for performing the experiments
src/itea: source code implementation for the ITEA algorithm, as used in the paper.

Dependencies

Since the libraries can be updated in the future and present compatibility errors, below are the version of the libraries utilized in the experiments.

Library	Version
numpy	1.19.1
pandas	1.1.0
shap	0.36.0
lime	0.2.0.1
filelock	3.0.12
autograd	1.3
scipy	1.5.2
eli5	0.10.1
scikit	0.23.2
statsmodels	0.11.1
typing	3.7.4.1

License

This source code is distributed under the GNU GENERAL PUBLIC LICENSE.

Acknowledgments

This project is funded by Fundação de Amparo À Pesquisa do Estado de São Paulo (FAPESP), grant number 2018/14173-8 and Fundação Universidade Federal do ABC.

gAldeia/partial-effects-ITEA

Measuring Feature Importance of Symbolic Regression Models Using Partial Effects

Folder structure

Dependencies

License

Acknowledgments