ConformalLLM

Extending Conformal Prediction to LLMs

Read our paper here Conformal Prediction with Large Language Models for Multi-Choice Question Answering

Code Contributors: Charles Lu and Bhawesh Kumar

Code Organization

conformal_llm_scores.py contains the python script for classification using 1-shot question prompts. It outputs three files

The softmax scores corresponding to each subjects for each of the 10 prompts
The accuracy for each subject prompt for mmlu-based 1-shot question as a dictionary where the key is the subject name and value is a list containing accuracy for each of the 10 prompts.
The accuracy for each subject prompt for gpt4-based 1-shot question as a dictionary where the key is the subject name and value is a list containing accuracy for each of the 10 prompts.

In conformal.ipynb, we have results for all conformal prediction experiments and gpt4 vs mmlu based prompt comparison. It requires the three files outputted by conformal_llm_scores.py to work. To run the experiment, download the llm_probs_gpt.zip file, unzip it and save it in your working directory and then run the conformal.ipynb file.

If you would like to run the experiments from scratch, apply for LLaMA access here and then use the hugging face version of LLaMA by converting original LLaMA weights to hugging face version refer here for instructions and then run the conformal_llm_scores.py script. Requirements can be installed:

pip install -r requirements.txt

Fork notes

The original repository contained .pkl files which have been converted to .json by pickle2json.py.
Hugging face models can fill the .cache, keep an eye on that.

krosenfeld-IDM/ConformalLLM

ConformalLLM

Extending Conformal Prediction to LLMs

Code Contributors: Charles Lu and Bhawesh Kumar

Code Organization

Fork notes