This repository lists the prototype algorithms our group developed for data collection/analysis under local differential privacy and is maintained by Tianhao Wang.
Frequency Oracle (primitive to estimate the histograms)
Related Paper: Locally Differentially Private Protocols for Frequency Estimation
Related Paper: Locally Differentially Private Sparse Vector Aggregation
Source code is to be open sourced.
Density Oracle (for numerical/ordinal values)
Related Paper: Estimating Numerical Distributions under Local Differential Privacy
Clarification: Citation 33 should be Ning Wang et al. Collecting and Analyzing Multidimensional Data with Local Differential Privacy. ICDE 2019.
Frequent Itemset Mining under LDP
Related Paper: Locally Differentially Private Frequent Itemset Mining
Errata: In Equation (10) of Section V, there are three terms, two of them misses the coefficient
Clarification: To find top-k itemsets, we also consider singleton estimates from SVIM (the method for singleton mining).
A list of Post-Porcess Methods for LDP
Related Paper: Locally Differentially Private Frequency Estimation with Consistency
Publish streaming data under LDP
Related Paper: Continuous Release of Data Streams under both Centralized and Local Differential Privacy
Code available at this repository.
Multi-Dimensional Range Query under LDP
Related Paper: Answering Multi-Dimensional Range Queries under Local Differential Privacy
Code available at this repository.
Heavy Hitter Identification
Related Paper: Locally Differentially Private Heavy Hitter Identification
Errata: For the AOL dataset, there are 0.12M, instead of 0.2M unique queries. This is a typo that does not change any result.
Clarification: For the QUANTCAST data, I downloaded the data for one month (which contains 10 billion clicks), and sample for 5min (i.e., divide the # clicks by 30 * 24 * 5 * 60). The dataset is available upon request.
Multi-Dimensional Analytics
Related Paper: Multi-Dimensional Analytics Related Paper: Answering Multi-Dimensional Analytical Queries under Local Differential Privacy
Marginal Estimation
The source code is not opened yet, but the similar code (plus a data synthesizing component) for the central DP setting is opened at DPSyn (related info at nist challenge 1) and DPSyn2 (related info at nist challenge 2).
Related Paper: CALM: Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy
Shuffler Model
Related Paper: Improving Utility and Security of the Shuffler-based Differential Privacy
Training GBDTs in the Federated Model
Related Paper: Federated Boosted Decision Trees with Differential Privacy Source code is open sourced at Sam's repo
Clustering in the Vertical Federated Model
Related Paper: Differentially Private Vertical Federated Clustering (accepted to VLDB 23) Source code is to be open sourced.
I mainly used Python 3 with numpy. Although not tested, the code should be compatible with any recent version. I also use a special package called xxhash for high hashing throughput. It can be changed to the builtin hash function. In case your local environment does not work, this is the package versions I used from my local side.
Python 3.6
xxhash 1.0.1
numpy 1.11.3
pytest 3.4.0
Or, run
pip install -r requirements.txt
pytest