This repo contains the main source code for the experiments and analysis in the paper "Characterizing the Representation Disparity of Differential Privacy". It also links to repositories for some of the supplementary experiments in the paper, which are contained in separate repos.
Configure environment by running: pip3 install -r REQUIREMENTS.txt
.
For our experiments, we use Python 3.6.7 and a single NVIDIA RTX 2080 Ti GPU with 11GB of RAM.
To run a training experiment, use train.py
and the --params
flag to provide a path to a .yaml
file containing the experiment parameters. For example:
python3 train.py --params params/params_celeba_nodp.yaml
This repository uses the following datasets:
- MNIST (included in
torchvision.datasets
) - CIFAR-10 (included in
torchvision.datasets
) - CelebA (link)
- Labeled Faces in the Wild (LFW) (link)
- German Credit (link)
- COMPAS (link)
- Adult Income (link)
We also generate the MMNIST and MC10 datasets using the original MNIST and CIFAR-10 datasets. Details of the dataset generation process are described in the paper. The classes implementing these datasets are in utils/mmnist_dataset.py
and utils/mc10_dataset.py
, respectively. Samples from the datasets are shown below.
We gratefully acknowledge the authors of the paper "Differential Privacy Has Disparate Impact on Model Accuracy" arXiv for providing their open-source code, which was used to build this repository.
As in [4], use compute_dp_sgd_privacy.py
copied from public repo.
Except where otherwise indicated, we use the implementation of DPSGD from [4], which is based on TF Privacy repo and [1], [2], and [3] below.
[1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In CCS, 2016.
[2] H. B. McMahan and G. Andrew. A general approach to adding differential privacy to iterative training procedures. arXiv:1812.06210, 2018
[3] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang. Learning differentially private recurrent language models. In ICLR, 2018
[4] E. Bagdasaryan, V. Shmatikov. Differential Privacy Has Disparate Impact on Model Accuracy. In NeurIPS, 2019.