/interactive_xai_system

Interactive XAI System

Primary LanguageVueMIT LicenseMIT

Development of an Interactive Explainable AI System for End-users


Paper: Designing Interactive Explainable AI Systems for Lay Users

Authors: Miguel Angel Meza Martínez & Alexander Maedche

Developers: Felix Hasse, Nicolas Kiefer, Isabelle Konrad, Tilio Schulze & Miguel Angel Meza Martínez

Version: 1.0

[[TOC]]

First Steps

Environment and dependencies

Frontend

  1. Make sure that the version 16 of Node.js is installed on your system
  2. Run the following commands in a terminal in the project base directory
npm install -g @vue/cli
npm install -g @vue/cli-service
cd Frontend/app
npm install

Backend

The python requirements are saved in the API folder. Depending on your os, choose the requirements file and create an environment with either conda or install directly with pip. Additionally, make sure to initialize the database running the following script

cd API/scr/
python database_creation.py

Run application

Terminal 1:

cd API/scr/
python main.py

Terminal 2:

cd Frontend/app/
npm run serve

Interactive API docs are accessible with http://localhost:8000/docs. The root for all API requests is http://localhost:8000/.
The application GUI is accessible with http://localhost:8080/. The starting page is "/dataset". To access the admin panel, use "/admin".

Project structure

The application consists of a web-based frontend GUI and a backend API server. The frontend is built with the Vue.js framework, the API with the FastAPI-Python framework.

project structure

Backend

Overview of Files

The Backend files can be found in the folder "API/". Here is a short summary about what each file or directory does:

main.py:

  • starting point for FastAPI process
  • defines the API requests accessible by the front-end
  • contains documentation for interactive FastAPI docs (http://localhost:8000/docs)
  • launches explanation sub-processes defined in task_gen.py
  • defines manager process for shared process memory

models.py:

  • defines the pydantic models for requests, responses and other data schemas
  • using models ensures data validation and basic type conversion

constants.py:

  • defines commonly used strings and numbers to reduce magic strings
  • contains the constraints for the dataset attributes
  • defines dictionary mappings for data transformations

database_req.py:

  • interface for database interaction with the API
  • defines functions for database access, seperates database logic and structure from API requests

task_gen.py:

  • defines explanation sub-process logic for lime and shap
  • access to shared memory (job queue, explanation results dictionary)
  • defines timeout logic for explanation results

lime_utils.py:

  • Helper class for the lime explanation generation
  • uses code from the xai reference project
  • loads the german credit dataset and the smote_ey.tf model and fits a preprocessor
  • creates a lime explanainer
  • contains method to get lime explanation for a given instance and return it in the json format specified by the API

shap_utils.py:

preproc.pickle:

DataLoader_ey.py:

  • taken from the xai reference project and slightly adapted
  • method data_loader to load the credit data, drop the unused columns and remove outliers
  • preprocessing method for preparing data for model prediction
  • method for adding AI recommendation and confidence

database.db:

  • SQLite database for application, counterfactual information and experimentation information storage
  • overview over structure and methods provided below

smote_ey.tf/:

  • the trained tensorflow model from the xai reference project
  • used for AI recommendation and confidence and explanation generation

Data/:

  • folder containing the raw data that was read into the database
  • contains the german credit dataset in csv format and
  • contains the counterfactuals in json format, the all_counterfactuals.json contains the initially generated counterfactuals, the cfs_response_format.json a reformatted version of the counterfactuals that was added to the database

SQLite Database

The application uses a SQLite database for saving the cleaned german credit dataset (GCD), the experiments and the corresponding results, aswell as pre-generated DICE counterfactuals for each loan application. The python module SQLite3 is used as the database interface for the API.
An overview of the different tables and their strucutre is provided below. Below that, you can find an overview of the relevant database interaction functions. These are defined in the file database_req.py which is located in the "API/" folder


Database Table Structure: Database Table Structure All cleaned applications from the GCD are stored in the applicants table. The attributes foreignworker, status_sex and classification_ were dropped and NN_recommendation and NN_confidence added. These two attributes refer to the actual model prediction for this application and were determined using the smote_ey.tf model.
The dice table contains the pregenerated counterfactuals for the applications of the GCD. The counterfactuals column contains this information in json format. These jsons have the key 'counterfactuals' referring to a list of 5 counterfactuals in json format. This is necessary because SQLite cannot store lists.
The elements in the dice table have the primary key instance_id, which refers to the ids in the applicants table. Therefore when an element in the applicants table is deleted or changes its id, the according element in the dice table should be deleted or changed as well.
The experimentation functionality is covered by the databases experiments and results. In the experiments table all the relevant information for the defined experiments, like the type of explanation and ids of applications that should be shown are stored in json format. The json contains the keys, which are defined for the ExperimentInformation model in the models.py file. The results table is used to store the decisions of a user for a certain experiment. They are stored in json format. The json contains jsons in the SingleResult format defined in the models.py folder. Their loan id is the key to reference those SingleResult jsons.
The key experiment_name of the results table references the attribute name in the experiments table. Therefore when an experiment is deleted or changes its name, this change should also apply to the results table.


Database Interaction: Database Interaction


XAI Explanations

The XAI methods used in this project are LIME, SHAP and DICE. As the generation of counterfactuals using DICE takes a large amount of time (1,5 - 3 minutes), the counterfactuals have been pre-generated for each instance of the GCD dataset and stored in the data table. They can thus not be dynamically generated.
For SHAP and LIME however, the explanations are computed in the backend, which makes it possible to dynamically generate what-if analysis for modified dataset instances.
To efficiently generate explanations, the API scans the number of available CPU cores of the server it is running on, and generated calculation processes accordingly. The sole task of these calculation processes is to generate explanations when clients request them.
The requested explanation tasks are saved in a FIFO-queue to which all calculation processes have access. One of the running calculation processes will take the task from the queue and will return it in a process-shared dictionary when it has finished generating the explanation. The user can access the generated explanation using the id returned by the API when the explanation was scheduled.

Explanation Task flow: rgb(133, 192, 255)
Explanation Result flow: rgb(217, 155, 255) Explanations compuation & data flow

Front-End

You can find detailed documentation of all components used in the Frontend here