microsoft/prv_accountant

Discrete mean differs from continuous mean significantly

xuefeng-xu opened this issue · 10 comments

I ran the DPSGD example with the sampling_probability=0.125, it got an error said “Discrete mean differs from continuous mean significantly”. Could you please explain why is that?

from prv_accountant.dpsgd import DPSGDAccountant

accountant = DPSGDAccountant(
    noise_multiplier=0.8,
    sampling_probability=0.125,
    eps_error=0.1,
    delta_error=1e-10,
    max_steps=1000
)

eps_low, eps_estimate, eps_upper = accountant.compute_epsilon(delta=1e-6, num_steps=1000)
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    accountant = DPSGDAccountant(
  File "/Users/***/miniconda3/envs/torch/lib/python3.8/site-packages/prv_accountant/dpsgd.py", line 23, in __init__
    super().__init__(prvs=PoissonSubsampledGaussianMechanism(noise_multiplier=noise_multiplier,
  File "/Users/***/miniconda3/envs/torch/lib/python3.8/site-packages/prv_accountant/accountant.py", line 87, in __init__
    dprvs = [discretisers.CellCentred().discretise(tprv, domain) for tprv in tprvs]
  File "/Users/***/miniconda3/envs/torch/lib/python3.8/site-packages/prv_accountant/accountant.py", line 87, in <listcomp>
    dprvs = [discretisers.CellCentred().discretise(tprv, domain) for tprv in tprvs]
  File "/Users/***/miniconda3/envs/torch/lib/python3.8/site-packages/prv_accountant/discretisers.py", line 50, in discretise
    raise RuntimeError("Discrete mean differs from continuous mean significantly.")
RuntimeError: Discrete mean differs from continuous mean significantly.

Me too. Has the landlord resolved the issue

This typically happens when parameters are used that result in very large epsilons. If I run compute-dp-epsilon -s 0.8 -p 0.1 -d 1e-6 -i 1000 I already get epsilons larger than 40 which is not a very meaningful privacy protection. Were these the parameter you were interested in specifically?

This typically happens when parameters are used that result in very large epsilons. If I run compute-dp-epsilon -s 0.8 -p 0.1 -d 1e-6 -i 1000 I already get epsilons larger than 40 which is not a very meaningful privacy protection. Were these the parameter you were interested in specifically?

@wulu473 No, I just randomly test different values of sampling_probability to see the epsilons, other params are stay the same as in the readme. I wonder if there is a way to resolve this, i.e., returning the epsilons instead of throwing an error?

Hi !
I am running into the same issue and the error message is not really helpful. If as @wulu473 says it's because epsilon is considered too loose the message should state that precisely. Also even if say 50 is not up to say industry standards, Opacus should allow experimentations with such a large value.

Thanks for the feedback. I agree that the error message may be hard to understand without knowing the implementation. I can look into making the discretization step more robust for unrealistically high epsilons if this is something people are running into repeatedly. At the very least I'll update the error message.

Hi @wulu473 I posted this issue on opacus: pytorch/opacus#604 which elaborates on my previous comment I didn't know whether to post it here or there. I posted it on opacus because it is how I was exposed to it. I hope you don't mind !

@wulu473 We are currently looking into this a bit here in Helsinki because we face the error when computing noise_multiplier with opacus for few-shot models. The $(\epsilon, \delta)$ values that seem to cause the error are not unrelatically high. E.g. the prv_accountant.dpsgd.DPSGDAccountant fails for a value that would correspond to $\epsilon \approx 6.8$ and $\delta = 1e^{-5}$ with RDP (first row of prv_broken_values.csv). The prv_accountant version is 0.2.0

I used the below implementation with parameters that match the epsilon_error and delta_error used in opacus. (Not sure if they are reasonable but I would think so.)

Code for reproducing:

from prv_accountant.dpsgd import DPSGDAccountant
import pandas as pd

df = pd.read_csv("results/prv_broken_values.csv")
for i, row in df.iterrows():
    try:
        accountant = DPSGDAccountant(
            noise_multiplier=row["sigma"],
            sampling_probability=row["sample_rate"],
            eps_error=0.01,
            delta_error=row["delta"] / 1000,
            max_steps=int(row["steps"])
        )

        eps_low, eps_estimate, eps_upper = accountant.compute_epsilon(num_steps=int(row["steps"]), delta=row["delta"])
        print(eps_upper)
    except:
        print("Broken prv")

prv_broken_values.csv:

sigma,sample_rate,steps,delta,corresponding_rdp_epsilon
1.1190338134765625,0.111111,90,1e-05,6.799326796587654
1.462371826171875,0.125,320,1e-05,9.531460587058456
1.3625717163085938,0.125,320,1e-05,10.637274866862626
1.429534912109375,0.125,320,1e-05,9.869719941118808
1.3596649169921875,0.125,320,0.0001,9.475935768346865
1.4189605712890625,0.125,320,0.0001,8.856518572214307
1.348785400390625,0.125,320,0.0001,9.597854678608819
1.5253524780273438,0.125,320,1e-07,10.738462714495805
1.57373046875,0.125,320,1e-07,10.255346101802672
0.997283935546875,0.125,80,1e-05,8.94039379409982
0.9454460144042969,0.125,80,1e-05,9.884252821197466
0.9023284912109375,0.125,80,1e-05,10.815695963263234
0.8504867553710938,0.125,40,1e-05,9.015349608271817
0.8737449645996094,0.125,40,1e-05,8.548087958662755
0.8320808410644531,0.125,40,1e-05,9.418355574895706
0.7974414825439453,0.125,40,1e-05,10.253677328523128
0.7291364669799805,0.125,16,1e-05,8.788822485767234
0.6870651245117188,0.125,16,1e-05,9.90809677856695
0.7284049987792969,0.125,16,1e-05,8.80783162412548
0.6993370056152344,0.125,16,1e-05,9.571710266550044
1.462371826171875,0.125,320,1e-05,9.531460587058456
1.3625717163085938,0.125,320,1e-05,10.637274866862626
1.429534912109375,0.125,320,1e-05,9.869719941118808
0.997283935546875,0.125,80,1e-05,8.94039379409982
0.9454460144042969,0.125,80,1e-05,9.884252821197466
0.9023284912109375,0.125,80,1e-05,10.815695963263234
0.8504867553710938,0.125,40,1e-05,9.015349608271817
0.8737449645996094,0.125,40,1e-05,8.548087958662755
0.8320808410644531,0.125,40,1e-05,9.418355574895706
0.7974414825439453,0.125,40,1e-05,10.253677328523128
1.0302276611328125,0.111111,90,1e-05,7.905717689020909
1.0704879760742188,0.111111,90,1e-05,7.366555785990796
1.0301132202148438,0.111111,90,1e-05,7.90732222804079

Hope the information is sufficient. We will try to have a look if we can contribute to a fix.

Best

Hi @wulu473,

(disclaimer that I just debugged the issue and am neither an expert of your implementation or of privacy accounting).

Observation

I debugged the code and arrived at some point at the mean() function of the PrivacyRandomVariableTruncated class. The grid (points variable) used to compute the mean is constant apart from the lowest (self.t_min) and highest point (self.t_max). See the line of code here. It looks like this [self.tmin, -0.1, -0.01, -0.001, -0.0001, -1e-05, 1e-05, 0.0001, 0.001, 0.01, 0.1, self.tmax].

It seems that the tmin and tmax are of the order of [-12,12] for the examples that I posted above and even up to [-48,48] for the example that @jeandut posted in the opacus# issue whereas they are more like [-7,7] for the readme example for DP-SGD.

We suspect that the integration breaks down when the gridspacing between between tmin / tmax get's too large.

Proposed solution

Determine the points grid based on tmin and tmax. E.g., using this implementation that is inspired by opacus implemenation but determines the start and end of the logspace based on tmin and tmax.

lower_exponent = int(np.log10(np.abs(self.t_min)))
upper_exponent = int(np.log10(self.t_max))
points = np.concatenate([[self.t_min], -np.logspace(start=lower_exponent, stop=-5, num=10), [0],
                        np.logspace(start=upper_exponent, stop=-5, num=10)[::-1], [self.t_max]])

If I run this, I don't get the error anymore and the epsilon for the readme example for DP-SGD is identical.

Question
Is this is harmless fix or is there some theory from the PRV accountant that speaks against extending the grid here?

Thanks for looking into this. The proposed solution seems very sensible. Initially I had chosen points that give a robust solution for most realistic cases but it seems that there are some cases where it's insufficient. In general, adding any additional points is safe and won't affect the robustness negatively.

If you have this fix ready and it's not too much trouble, any contribution via a PR would be appreciated.

Great!

Please have a look at #38.

Btw. I just saw that there is still #35. Would be great if you could close it or merge it.