Allowing users to use large `target_epsilon` for debugging and research
jeandut opened this issue ยท 8 comments
๐ Feature
Allowing users to use large target_epsilon
when making their pipeline DP (as input to get_noise_multiplier
before make_private
)
Motivation
Currently using a target budget of epsilon > 10
. is generally not supported by Opacus for most datasets / batch sizes / number of gradient steps because of discretization issues stemming from Microsoft's prv_accountant
(microsoft/prv_accountant#36) from which opacus's PRVAccountant
implementation originates).
I argue that this should be changed for multiple reasons:
- allow researchers to study every epsilon regimes they like
- this cut-off of
epsilon=10
is completely arbitrary. It is well known that epsilon is highly application-dependent: for some applications it might makes sense perfectly to use epsilon even as high as 50 if other mitigations are in place or if participants agree to release their data under this privacy setting; for some other applicationsepsilon=10
could be considered way too high if the gradients are deemed extremely sensitive for instance (remember thatexp(10.)=22,026
). - this would allow debugging complex implementations based on opacus: What happens if epsilon is very large ?: Do we retrieve initial performance without DP or is there still some batching-effects at play even with very low noise, which impacts accuracy ?
Warning the user that the privacy budget they set could be considered as high by some industries should be sufficient.
What do you think ?
In the meantime is there a relatively simple hack one could use in research experiments to use large target epsilons ?
Pitch
User provides a large target_epsilon
say 50. in get_noise_multiplier
and it runs wo throwing:
RuntimeError: Discrete mean differs from continuous mean significantly.
By instead displaying a warning.
Alternatives
In the meantime is there a relatively simple hack one could use in research experiments to use large target epsilons ? Such as exposing another custom get_noise_multiplier
function wo this limitation ?
Additional context
I was hesitating between posting it in prv_accountant or here
Hi @jeandut,
if I understood correctly, the
Could you provide privacy parameters (num_steps
, noise_multiplier
and subsampling_ratio
) for the case where it breaks down? We were able to fork the opacus get_noise_multiplier()
and make a bit more robust but we are still waiting on internal review before proposing it here.
Have you considered using the rdp
accountant instead? It should work as a drop-in replacement for the prv
accountant when creating the privacy_engine
. The privacy guarantees of prv
are tighter than rdp
but in the end you will just get a larger noise_multiplier
for the same target_epsilon
and target_delta
. The rdp
implementation is more robust in my experience.
Thank you so much @Solosneros for your quick answer !
Indeed the cutoff is due to the numerical implementation not being robust enough with large epsilon an not because of some ad-hoc logic. It is very cool that you guys have something in store to robustify it.
In my use case I am just tracing a DP curve for a research article and want to make sure that my implementation is asymptotically equivalent to the one wo DP by taking epsilon very large aka for debugging purposes.
I have run into this issue for multiple parameters values but typical values include i.e.:
sample_rate
0.19120458891013384
num_total_steps
5000
dp_target_epsilon
20.0
dp_target_delta
0.001
Or
sample_rate
0.19120458891013384
num_total_steps
5000
dp_target_epsilon
50.0
dp_target_delta
0.001
As for the reason I am using prv
it is because it was the default value in opacus. But you are right a quick fix would be probably to switch to rdp
. I will try that in the mean time !
Note that indeed using RDP instead of PRV allows to use high epsilons. Feel free to either close or relabel my issue.
Hi @jeandut,
sorry for the delay. We tried out our fix further and while it seems a bit more robust it still fails occasionally.
We'll get back to this but maybe somebody else has a fix. The RDP workaround definitely works.
I posted a potential fix to this to the prv github but I am not sure if it is valid. Let's see what the folks from Microsoft say. microsoft/prv_accountant#36 (comment)
PR is open #606.
Thanks! Will take a look.
Committed PR #606 so I close this issue.