This repository contains the source code corresponding to our paper titled "Approximate Kernel Density Estimation under Metric-based Local Differential Privacy". All algorithms are implemented in Python 3.
-
small_datasets/
: Provides sampled example datasets for the CodRNA, CovType, RCV1, Yelp, and SYN (default) datasets mentioned in the paper. -
generate_syn_datasets.py
: Contains code to generate a series of SYN datasets.
RACE.py
,DM.py
,PM.py
,SW.py
,GI.py
,FKM_LL_RACE.py
,FKM_LR_RACE.py
andmLDP_KDE.py
: Store the code for algorithms RACE, DM, PM, SW, GI, FKM-LL-RACE, FKM-LR-RACE and mLDP-KDE, respectively.examples.py
: Provides an example of the mLDP-KDE algorithm with 1000 construction points and 1 query point.
Python scripts are provided to run all experiments.
-
test_epsilon_utility_efficiency.py
: Contains code for Expt1 - Utility vs. Privacy and Expt2 - Time Efficiency. -
test_epsilon_sketchsize_communication.py
andtest_sketchsize_MSE.py
: Contain code for Expt3 - Sketch Size and Communication Cost. -
test_m_utility_efficiency.py
andtest_n_utility_efficiency.py
: Contain code for Expt4 - Scalability. -
test_epsilon_utility_L1.py
andtest_epsilon_utility_Angular.py
: Contain code for Expt5 - Performance on Other LSH Kernels. - To reproduce results:
- Run any of the experimental scripts simply by using
python scripts_name.py
. - In scripts requiring dataset selection, modify the value of
selected_flag
to generate results for the chosen dataset. - In scripts requiring privacy radius
$r$ selection, modify the value ofnearest_flag
andL_R_set
to generate results for the chosen$r$ .
- Run any of the experimental scripts simply by using
plotting_tools.py
: Provided for visualization. It requires Python 3.7 (or higher versions) and Matplotlib.
-
Additional Experiments:
-
test_small_range_epsilon_utility.py
: Contains code to test the utility of the RACE, GI and mLDP-KDE algorithms using a smaller$\varepsilon$ range: [1, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20]. -
test_r_effect.py
: Contains code to evaluate the impact of a range of privacy radius$r$ on the mLDP-KDE algorithm. We tested two classes of$r$ :- average distance from a point to its t-nearest neighbors for t ∈ {1, 10, 100, 1000, 10000};
- maximum distance from a point to its t-nearest neighbors for t ∈ {1, 10, 100, 1000, 10000};
-
heatmap_visualization.py
: Contains code to visualizing 2D Heatmaps for KDE on each dataset.
-
- Update
mLDP_KDE.py
: Added comments to enhance code clarity. - Update
parameters.py
: Added a set of parameters for new experiments. - Update
plotting_tools.py
: Added functionsdraw_heatmap
anddraw_small_range_epsilon_MSE
for new experiments.