/LDP_Protocols

Sample LDP implementation in Python

Primary LanguagePythonMIT LicenseMIT

This repository lists the prototype algorithms our group developed for data collection/analysis under local differential privacy and is maintained by Tianhao Wang.

Protocols

OLH

Frequency Oracle (primitive to estimate the histograms)

Related Paper: Locally Differentially Private Protocols for Frequency Estimation

Extended-OLH for Sparse Aggregation

Related Paper: Locally Differentially Private Sparse Vector Aggregation

Source code is to be open sourced.

Square Wave

Density Oracle (for numerical/ordinal values)

Related Paper: Estimating Numerical Distributions under Local Differential Privacy

Clarification: Citation 33 should be Ning Wang et al. Collecting and Analyzing Multidimensional Data with Local Differential Privacy. ICDE 2019.

SVSM

Frequent Itemset Mining under LDP

Related Paper: Locally Differentially Private Frequent Itemset Mining

Errata: In Equation (10) of Section V, there are three terms, two of them misses the coefficient $\ell$.

Clarification: To find top-k itemsets, we also consider singleton estimates from SVIM (the method for singleton mining).

Post-Porcessing

A list of Post-Porcess Methods for LDP

Related Paper: Locally Differentially Private Frequency Estimation with Consistency

ToPL

Publish streaming data under LDP

Related Paper: Continuous Release of Data Streams under both Centralized and Local Differential Privacy

Code available at this repository.

HDG

Multi-Dimensional Range Query under LDP

Related Paper: Answering Multi-Dimensional Range Queries under Local Differential Privacy

Code available at this repository.

I am slowly cleaning code for the protocols below:

PEM

Heavy Hitter Identification

Related Paper: Locally Differentially Private Heavy Hitter Identification

Errata: For the AOL dataset, there are 0.12M, instead of 0.2M unique queries. This is a typo that does not change any result.

Clarification: For the QUANTCAST data, I downloaded the data for one month (which contains 10 billion clicks), and sample for 5min (i.e., divide the # clicks by 30 * 24 * 5 * 60). The dataset is available upon request.

HIO

Multi-Dimensional Analytics

Related Paper: Multi-Dimensional Analytics Related Paper: Answering Multi-Dimensional Analytical Queries under Local Differential Privacy

CALM

Marginal Estimation

The source code is not opened yet, but the similar code (plus a data synthesizing component) for the central DP setting is opened at DPSyn (related info at nist challenge 1) and DPSyn2 (related info at nist challenge 2).

Related Paper: CALM: Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy

MURS

Shuffler Model

Related Paper: Improving Utility and Security of the Shuffler-based Differential Privacy

DP-FL-GBDT

Training GBDTs in the Federated Model

Related Paper: Federated Boosted Decision Trees with Differential Privacy Source code is open sourced at Sam's repo

DP-VFL-Clustering

Clustering in the Vertical Federated Model

Related Paper: Differentially Private Vertical Federated Clustering (accepted to VLDB 23) Source code is to be open sourced.

Environment

I mainly used Python 3 with numpy. Although not tested, the code should be compatible with any recent version. I also use a special package called xxhash for high hashing throughput. It can be changed to the builtin hash function. In case your local environment does not work, this is the package versions I used from my local side.

Python 3.6

xxhash 1.0.1

numpy 1.11.3

pytest 3.4.0

Or, run

pip install -r requirements.txt
pytest