Codebase for the paper Are Large Language Models Post Hoc Explainers?
Without pre-made conda environment
1. install anaconda
2. open terminal
3. conda update -n base -c defaults conda
4. conda create -n LLM_PostHocExplainer python=3.10
5. conda activate LLM_PostHocExplainer
6. Install package:
conda install pytorch
conda install pandas
conda install requests
conda install scikit-learn
conda install captum -c pytorch
conda install tqdm
conda install lime
conda install openai
OR
Installation instructions pre-made conda environment
1. install anaconda
2. open terminal
3. conda update -n base -c defaults conda
4. conda env create -f LLM_PostHocExplainer.yml
5. conda activate LLM_PostHocExplainer
This repository is organized as follows:
The data
folder contains the pre-processed Blood, COMPAS, Credit and Adult datasets.
The llms
folder contains code for prompt generation, LLM API calls, and response processing.
The models
folder contains pre-trained Logistic Regression (LR) and Large Artificial Neural Network (ANN-L) classifiers.
The openxai
folder contains code from Agarwal et al. 2022 (post-hoc explanations, perturbations, faithfulness scores).
The notebooks
folder contains demonstrations such as model training and model inspection.
The outputs
folder stores results from post-hoc explainers and LLM explainers.
Prompt texts are located in the prompts.json file. Prompts are constructed via pre_text
+ ICL samples + mid_text
+ test sample + post_text
. Note that in many cases the test sample is not included, and the mid_text
string is therefore empty. Where variables are required in the prompt text, these are enclosed in square brackets in prompts.json
and defined in a dictionary within LLM_PostHocPipeline.py.
The paper notes that we present perturbations in two main formats: as the raw perturbed inputs alongside their corresponding outputs (shown in the Sec. 3.1 and 3.2 templates); or as the change between each perturbed input and the test sample, and the corresponding change in output (shown in Sec. 3.3). The second approach significantly aids the LLM in discerning the most important features, providing only the changes relative to the test sample, and bypassing the LLM's need to internally compute these differences.
When using perturbed inputs, sections 3.1 to 3.3 use prompt IDs io1-topk-v2
, pe2-topk-v2
, and pfp2-v2
, respectively.
When using perturbations, sections 3.1 to 3.3 use prompt IDs pfp-io1-topk
, pfpe2-topk
, and pfp2
, respectively.
Section 3.4 uses the prompt ID icl_exp
(add_explanation
should be set to true for this section).
When choosing a given prompt ID, corresponding prompt parameters should be updated accordingly (see below). These may be tweaked until the prompt text appears as desired.
There’s two main files you need:
LLM_PostHocPipeline.py
(Query the LLM with a prompt)FaithfulnessPipeline.py
(Parse the reply and calculate faithfulness)
Each step saves relevant information to the outputs folder
To generate explanations from a given LLM, run the following command:
python3 LLM_PostHocPipeline.py
The parameters used are located in the config file LLM_pipeline_config.json
.
data_name
— Name of the dataset to use: "blood", "compas", "credit" or "adult" (default — "adult")data_scaler
— Scaler for the data: "minmax", "standard" or "none" (default — "minmax")model_name
— Name of the model to use, e.g., "lr", "ann_l", etc. (default — "lr")base_model_dir
— Directory of the saved model (default — "./models/ClassWeighted_scale_minmax/")output_dir
— Directory to save LLM results to (default — "./outputs/LLM_QueryAndReply/")openai_api_key_file_path
— File path to your OpenAI API key (default — "./openai_api_key.txt")- WARNING: Do NOT share or push your LLM (openai) API key to github, it’s best to add the openai_api_key.txt to the .gitignore file or put it outside of your project directory on your local machine.
LLM_name
— Name of the LLM model (default — "gpt-4")temperature
— Parameter controlling the randomness of the LLM's output (default — 0)eval_min_idx
— The minimum test sample index for evaluation (default — 0)eval_max_idx
— The maximum test sample index for evaluation (default — 100)max_test_samples
— A hard limit on the number of test samples to evaluate (default — 100)SEED
— Seed value for reproducibility (default — 0)n_shot
— The number of examples used for in-context learning (ICL) the model (default — 16)icl_params
— Parameters for controlling the generation of ICL examples, see belowsampling_params
— Parameters for the different sampling strategies, see belowprompt_params
— Parameters for controlling the prompt of the model, see belowexperiment_params
— Parameters used to identify the experiment, see below
The prompts for querying the LLM are stored in the appropriate config file (default — prompts.json
).
The icl_params
dictionary contains the following parameters:
use_most_confident
— Boolean controlling whether to use random perturbations vs perturbations with the most confident predictions (default — true)use_class_balancing
— Boolean controlling whether or not to balance class labels when selecting perturbations (default — true)icl_seed
— The seed used to generate ICL samples (default — 0)sorting
— The order of ICL examples: "alternate" (alternate between class 0 and class 1) or "shuffle" (random shuffle) (default — "shuffle")sampling_scheme
— ICL sampling strategy, see below (default — "perturb")explanation_method
— Type of post-hoc explanation to use for Explanation-Based ICL (default — "lime")explanation_sampling
— Sampling method for Explanation-Based ICL (default — "balanced")
The sampling_params
dictionary contains the following schemes:
perturb
— This dictionary contains the standard deviation, number of samples, and seed for Gaussian perturbations around test samples (defaults —std = 0.1
,n_samples = 10000
,perturb_seed = 'eval'
). Note that settingperturb_seed
to 'eval' will select the test point's index as the perturbation seedconstant
— Empty dictionary (no parameters in current implementation) to cover the case of fixed ICL samples for all test points
The prompt_params
dictionary contains the following parameters:
prompt_ID
— The ID of the prompt inprompts.json
(default — "pfpe2-topk")k
— The number of top-K features to request from the LLM. Use -1 for all features (default — 5)hide_feature_details
— Controls whether or not feature names and suffixes (e.g., Age is 27 years vs A is 27) are hidden (default — true)*hide_test_sample
— Hides the test sample being explained, showing only neighborhood perturbations (default — true)hide_last_pred
— Hides the last ICL example's prediction, used in Prediction-Based ICL (default — true)use_soft_preds
— Sets predictions to probability scores rather than labels (default — false)rescale_soft_preds
— If using soft predictions, rescales all predictions in the ICL to a 0-1 range (default — false)n_round
— Number of decimal places to round floats to in the prompt (default — 3)input_str
— String to prepend to each ICL input (default — "\nChange in Input: ")output_str
— String or list to prepend to each ICL output. For strings, use e.g., "Output — " (default — "\nChange in Output: ")input_sep
— Separator between blocks of ICL inputs (default — "\n")output_sep
— Separator between ICL inputs and ICL outputs (default — "")feature_sep
— Separator between blocks of <feature_name, feature_value> pairs (default — ", ")value_sep
— Separator between feature name and feature value (default — ": ")add_explanation
— Flag for adding explanations in the ICL prompt for Explanation-Based ICL (default — false)num_explanations
— Total number of explanations to subselect ICL-samples from, used in Explanation-Based ICL (default — 200)
*Note: The parsing replies code only works for the case where we hide feature names (e.g. the prompt looks like: “Change in Input: A: -0.081, B: -0.066, C: -0.103, D: -0.406, E: 0.098, F: -0.099, G: -0.044, H: 0.008, I: -0.064, J: 0.015, K: -0.155, L: 0.072, M: -0.123 Change in Output: 0”
instead of
“Change in Input: Age: -0.082, Final Weight: -0.043, Education Number: -0.099, Capital Gain: -0.327, Capital Loss: -0.112, Hours per Week: -0.106, Sex: -0.037, Workclass: -0.014, Marital Status: -0.018, Occupation: 0.039, Relationship: 0.073, Race: -0.022, Native Country: -0.113 Change in Output: -1”)
The experiment_params
dictionary contains the following parameters:
use_min
— Append minute of experiment start time into the experiment ID (default — true)use_sec
— Append seconds of experiment start time into the experiment ID (default — true)use_microsec
— Append microseconds of experiment start time into the experiment ID (default — false)
To evaluate explanations from a given LLM, run the following command:
python3 FaithfulnessPipeline.py
The parameters used for evaluating faithfulness metrics are as follows:
SEED
— seed value for reproducibility (default — 0)data_name
— name of dataset to use, e.g., "compas", "adult", etc. (default — "adult")data_scaler
— data scaler method, e.g., "minmax" (default — "minmax")model_name
— name of the model to use, e.g., "lr" (default — "lr")base_model_dir
— directory of the saved model (default — "./models/ClassWeighted_scale_minmax/")output_dir
— directory to read LLM results from (default — "./outputs/LLM_QueryAndReply/<experiment_ID>/")LLM_topks_file_name
— path to the LLM top-Ks file (default — "_.pkl")save_results
— save faithfulness evaluations (default — true)eval_min_idx
— the minimum index for evaluation (default — 0)eval_max_idx
— the maximum index for evaluation (default — 100)eval_topk_k
— the number of top-K features to evaluate faithfulness on (default — 5)LLM_top_k
— the number of top-K features in the LLM's explanations (default — 5)load_reply_strategy
— file extension of replies (default — "txt")calculateAUC
— calculates AUC across all top-K scores, rather than for individual scores (default — true)experiment_section
— set to "3.2" in order to parse LLM predictions as well as top-K values (default — "3.1")perturbation_mean
— mean of the perturbation (default — 0.0)perturbation_std
— standard deviation of the perturbation (default — 0.1)perturb_num_samples
— number of perturbed samples to sub-select from (default — 10000)
To automatically faithfulness scores after generating LLM explanations, set the appropriate parameters in the LLM_pipeline_wrapper_experiments.py
file, and run the following command:
python3 LLM_pipeline_wrapper_experiments.py
If you have questions/suggestions, please feel free to create GitHub issues.
Please consider citing if you find our codebase useful:
@misc{kroeger2024incontext,
title={In-Context Explainers: Harnessing LLMs for Explaining Black Box Models},
author={Nicholas Kroeger and Dan Ley and Satyapriya Krishna and Chirag Agarwal and Himabindu Lakkaraju},
year={2024},
eprint={2310.05797},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2310.05797},
}