/rcps

Official codebase for "Distribution-Free, Risk-Controlling Prediction Sets"

Primary LanguagePythonMIT LicenseMIT

Paper

Distribution-Free, Risk-Controlling Prediction Sets

@article{bates-rcps,
  title={Distribution-Free, Risk-Controlling Prediction Sets},
  author={Bates, Stephen and Angelopoulos, Anastasios N and Lei, Lihua and Malik, Jitendra and Jordan, Michael I},
  journal={arXiv preprint arXiv:2101.02703},
  year={2020}
}

Basic Overview

For general information about RCPS, you can check our blog post. This GitHub contains the code we used for the experiments in the RCPS paper. Each experiment lives in a different, appropriately named folder. The directory core contains code common to all of our experiments, including the implementations of concentration bounds and choice of lambda hat. The repository is still a work in progress; we will be continually updating the code to make it more user-friendly and remove clutter from our development. If you have trouble reproducing our results, please email angelopoulos@berkeley.edu.

Getting Started

We store some large files in our git repo via git-lfs; you may need to install and configure it from here. After installing git-lfs, you can clone this repository. Then, you can create the rcps conda environment by running the following line:

conda create --name rcps --file ./requirements.txt 

Each experiment requires different datasets. For the ./imagenet and ./hierarchical_imagenet experiments, you will need to point the scripts towards the val directory of your local copy of the Imagenet dataset. Similarly, for ./coco, you need to point the scripts towards your local copy of the 2017 version of MS COCO, available here. For the ./polyp and ./protein examples, a bit more work must be done.

Polyp data

We used data from five different datasets: HyperKvasir-SEG, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and ETIS-LaribPolypDB. Download each of these datasets and unzip them into the folder ./polyps/PraNet/data/TestDataset/{datasetname}. Then run the script ./polyps/PraNet/process_all_data.py, which should store the outputs of the tumor prediction model in the proper directory so you can run our experiments.

Protein data

For the AlphaFoldv1 experiments in ./proteins, you can point the scripts to the alphafold CASP-13 test set, available here.

License

MIT License