In today's data-driven world, data analytics is used by researchers or data scientists to create better models or innovative solutions for a better future. These models often tend to handle sensitive or personal data, which brings in some privacy concerns. For example, some AI models can memorize details about the data they've trained on and could leak these details later on. Differential privacy is a mathematical framework for measuring this privacy leakage and reducing the possibility of it happening.
This is where PyDP comes in. PyDP is a Python wrapper for Google's Differential Privacy project. The library provides a set of ε-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information. Thus, PyDP is helping us achieve better privacy.
Things to remember about PyDP :
- 🚀 Features differentially private algorithms including: BoundedMean, BoundedSum, Max, Count Above, Percentile, Min, Median, etc.
- All the computation methods mentioned above use Laplace noise only. (Other noise mechanisms will be added soon... 😃)
- 🔥 Currently supports Linux and OSX. (Windows coming real soon... 😃)
- ⭐ Supports all the Python 3+ versions.
Use the package manager pip to install PyDP.
pip install python-dp
Refer to this example to understand PyDP library usage.
For usage via code explanation, refer to Jupyer Notebook or Python file for carrot demo.
Documentation can be found here.
A sample of usage can be found below:
import pydp as dp # imports the DP library
from pydp.algorithms.laplacian import BoundedMean
# To calculate the Bounded Mean
# epsilon is a number between 0 and 1 denoting privacy threshold
# It measures the acceptable loss of privacy (with 0 meaning no loss is acceptable)
# If both the lower and upper bounds are specified,
# x = dp.BoundedMean(epsilon: double, lower: int, upper: int)
x = dp.BoundedMean(0.6, 1, 10)
# If lower and upper bounds are not specified,
# DP library automatically calculates these bounds
# x = dp.BoundedMean(epsilon: double)
x = BoundedMean(0.6)
# To get the result
# Currently supported data types are integer and float. Future versions will support additional data types
# Refer to examples/carrots.py for an introduction
x.quick_result(input_data: list)
Some of the good learning resources to get started with Python differential privacy (PyDP) project and understand the concepts behind it can be found here.
For support in using this library, please join the #lib_pydp Slack channel. If you’d like to follow along with any code changes to the library, please join the #code_dp_python Slack channel. Click here to join our Slack community!
If you'd like to contribute to this project please read these guidelines.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.