/RKHS-private-database

Code to reproduce ICML 2018 paper "Differentially Private Database Release via Kernel Mean Embeddings"

Primary LanguagePythonMIT LicenseMIT

Differentially Private Database Release via Kernel Mean Embeddings

Matej Balog, Ilya Tolstikhin, Bernhard Schölkopf

35th International Conference on Machine Learning (ICML 2018)

[PDF] [arXiv]

This repository contains scripts to reproduce the experiments appearing in this academic paper.

Setup

Conda environment setup:

conda create -n RKHS-private-database python=3.6.3 matplotlib=2.1.0 numpy=1.13.3 pytorch=0.2.0 scikit-learn=0.19.0
source activate RKHS-private-database

Data generation

Two synthetic data files were used to generate the plots in the paper:

  • D=2: data/mixture_of_Gaussians_N100000_D2{.npz, .json}
  • D=5: data/mixture_of_Gaussians_N100000_D5{.npz, .json}

You can re-generate these files yourself by executing:

python data.py 100000 2
python data.py 100000 5

Experiments

Figure 1 ("Publishable subset" experiments)

Results of the experiments shown in Figure 1 are stored in the two files

  • D=2: results/D2_alg1_leak_M10000.json
  • D=5: results/D5_alg1_leak_M10000.json

You can re-generate these files by re-running the respective experiments as follows:

python experiments.py ../data/mixture_of_Gaussians_N100000_D2 leak --M 10000 1
python experiments.py ../data/mixture_of_Gaussians_N100000_D5 leak --M 10000 1

To then re-generate the plots shown in Figure 1, execute:

python plot.py --alg1 ../results/D2_alg1_leak_M10000.json --path_save ../figures/leaksD2
python plot.py --alg1 ../results/D5_alg1_leak_M10000.json --path_save ../figures/leaksD5
figures/leaksD2 figures/leaksD5
Figure 1 Figure 1

Figure 2 ("No publishable subset" experiments)

To re-run the experiments shown in Figure 2:

python experiments.py ../data/mixture_of_Gaussians_N100000_D2 random --M 10000 1
python experiments.py ../data/mixture_of_Gaussians_N100000_D5 random --M 10000 1
python experiments.py ../data/mixture_of_Gaussians_N100000_D2 random --M 10000 2
python experiments.py ../data/mixture_of_Gaussians_N100000_D5 random --M 10000 2

To then re-generate the plots shown in Figure 2, execute:

python plot.py --alg1 ../results/D2_alg1_random_M10000.json --alg2 ../results/D2_alg2_random_M10000.json --path_save ../figures/nodataD2
python plot.py --alg1 ../results/D5_alg1_random_M10000.json --alg2 ../results/D5_alg2_random_M10000.json --path_save ../figures/nodataD5
figures/nodataD2 figures/nodataD5
Figure 2 Figure 2

BibTeX

@inproceedings{balog2018privacy,
  author = {Balog, Matej and Tolstikhin, Ilya and Sch\"olkopf, Bernhard},
  title = {Differentially {Private} {Database} {Release} via {Kernel} {Mean} {Embeddings}},
  booktitle = {35th International Conference on Machine Learning (ICML)},
  year = {2018},
  month = {July}
}