DP quantiles fail with a RuntimeError
TedTed opened this issue · 1 comments
Hi folks,
I started from the basic data analysis notebook and wanted to try out quantile computation with the exponential mechanism.
I slightly modified the 10th code cell of the notebook to change:
sn.dp_mean(
data = sn.to_float(data['age']),
privacy_usage = {'epsilon': .65},
data_lower = 0.,
data_upper = 100.,
data_rows = 1000
)
to:
sn.dp_median(
data = sn.to_float(data['age']),
candidates = [float(i) for i in range(100)],
mechanism = "Exponential",
privacy_usage = {'epsilon': .65},
data_lower = 0.,
data_upper = 100.,
data_rows = 1000
)
Executing this cell raises the following error:
RuntimeError: Error: node specification ExponentialMechanism(ExponentialMechanism { privacy_usage: [PrivacyUsage { distance: Some(Approximate(DistanceApproximate { epsilon: 0.65, delta: 0.0 })) }] }):
Caused by: custom sensitivities may only be passed if protect_sensitivity is disabled
This probably shouldn't happen (presumably the quantile mechanism should figure out the sensitivity to pass to the exponential mechanism?), and the error message itself is wrong, since passing protect_sensitivity = False
to sn.Analysis
doesn't solve the issue, but raises a different message:
RuntimeError: Error: node specification ExponentialMechanism(ExponentialMechanism { privacy_usage: [PrivacyUsage { distance: Some(Approximate(DistanceApproximate { epsilon: 0.65, delta: 0.0 })) }] }):
Caused by: sensitivity has 1 records, while the expected shape has 100 records.
Thanks for raising this issue. Following up on this message, we've been moving away from this library and I just can't get a deprecation notice on it soon enough. The notebook you're using hasn't been maintained and as you've pointed out, there was a regression that caused the median to break. I recommend using the OpenDP library instead. Admittedly, the OpenDP library doesn't have a quantiles implementation yet, but there are a couple different algorithms in development.
Here's a modification to the cell. The runtime error occurs when trying to compute the privacy budget, but you can make a release.
with sn.Analysis() as analysis:
# load data
data = sn.Dataset(path = data_path, column_names = var_names)
# get mean of age
age_median = sn.dp_median(
data = sn.to_float(data['age']),
candidates = [float(i) for i in range(100)],
privacy_usage = {'epsilon': .65},
data_lower = 0.,
data_upper = 100.)
print("DP median of age: {0}".format(age_median.value))
# explodes:
# print("Privacy usage: {0}\n\n".format(analysis.privacy_usage))
Under any other situation I would debug the issue and extend the test suite. But I think we'd be better off if I spent that time opening PRs for this algorithm under the OpenDP library instead.