Determinism and Noise Issues
Opened this issue · 3 comments
Describe the bug
The Fft
generation here is based on 10ms duration. Not only duration is not always precise, but also we are worried that this implementation causes excess noise depending on target machine.
When developing and testing the Belfort FHE Accelerator, we need to carefully evaluate the effects of noise. After noticing discrepancies, we used deterministic engine for debug. However, we observed that different bootstrapping keys were generated across executions with same seed. The loss of determinism is rare on AMD EPYC processors (3.5 GHz or 4.1 GHz), occurring roughly 1 in 10,000 executions. In contrast, it happens frequently on Intel Xeon processors (Silver 2.1 GHz and E5 2.3 GHz), where about 1 in 10 of executions produce different bootstrapping keys for the same seed.
The variations are sometimes minor, limited to fractional digits of the coefficients, but occasionally, even integer bits differ, resulting in failed computations. These failures are more concerning than the non-determinism itself, as they probably mean an increase in injected noise, which disrupts the expected computation outcomes.
We suspect that the issue may be related to reduced precision on lower clock frequency CPUs, rather than being specific to processor brands. However, we could not verify this hypothesis, as we lack access to faster Intel CPUs or slower AMD CPUs.
To Reproduce
Steps to reproduce the behaviour
- Generate keys
let seed = ...
let mut seeder = DeterministicSeeder::<ActivatedRandomGenerator>::new(Seed(seed as u128));
let shortint_engine = ShortintEngine::new_from_seeder(&mut seeder);
ShortintEngine::replace_thread_local(shortint_engine);
let params: ClassicPBSParameters = tfhe::shortint::parameters::PARAM_MESSAGE_2_CARRY_2;
let client_key = ClientKey::new(params);
let server_key = ServerKey::new(&client_key);
- Print your bootstrapping key. Printing only 10 is enough, as problems appear quickly
let fourier_bsk = match server_key.clone().bootstrapping_key {
ShortintBootstrappingKey::Classic(flbko) => flbko,
ShortintBootstrappingKey::MultiBit { .. } => panic!("Ingore this")
};
let bsk_flat = fourier_bsk.clone().data().to_vec();
for (index, elem) in bsk_flat.iter().take(10).enumerate() {
println!("Index: {}, Value: {:?}", index, elem);
}
- Execute this app many times, and compare the printed results. As mentioned above, even small number of executions are enough to capture it, depending on your CPU. Let's say, 0.0001% chance on some processors but 10% on others.
Expected behaviour
Once the seed is fixed, we expect to get the same exact bootstrapping key for any execution.
Configuration(please complete the following information):
- OS:
Ubuntu 22.04.4 LTS
- CPU:
Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
Additional context
For now, we are using experimental-force_fft_algo_dif4
feature, which fixes the issues:
- Determinism: Always the same bootstrapping key for the same seed
- Accuracy: Always correct calculations given different seeds
We tested these with 10'000 executions over a night, not more.
hello @furkanturan this is a known behavior for now, we are considering options at this point.
When you talk about increased noise, can you provide a measure of the output noise with different fft algorithms ?
The reason this is likely not a problem is the following:
consider an fft computation where the end result after ifft in a coefficient is close to 0.5 either smaller or bigger, depdending on some noise of the fft then we may be slightly over or under that value, then when we do the conversion, we first apply modulo 1 to be on the torus, so 0.49 is 0.49 and 0.51 is -0.49, that value is then converted to integers by multiplying by 2^64 e.g., then we may find values close to -2^63 or 2^63 which will have completely different bit representations (due to 2's complement) but on the torus are essentially the same value.
If you observe noise issues please let us know, for now I'm triaging this as not a bug but a feature request/improvement request
Cheers
@furkanturan do you have examples of the noise problem you say you are having ?
Hello @IceTDrinker
Sorry I could not follow this up earlier.
I do not have the exact copies of the debug output. As far as I remember, the issue looked like below, when I wrote the bootstrapping key to a json file. Often it was as the one on the left. Sometimes, it is the one in the middle, only some minor differences. Rarely, it is the one the right, which causes problems.
Those problems are often not an issue for TFHE-rs on CPU, but more of a problem for HW acceleration. I guess that is because we optimize the hardware for an given level of noise. That is why I suspected of high noise injection into the system; however, your explanation regarding FFT and noise is quite correct. Hence, I am doubtful what its implications is.
Anyways, using the experimental-force_fft_algo_dif4
has fixed the problem for us.
Kind regards,
Furkan
Expected | Minor Issues | Major Issues
---------------------------------|----------------------------------|---------------------------------
[ | [ | [
[ | [ | [
[ | [ | [
[ | [ | [
[ | [ | [
[ | [ | [
2.040106057990813, | 2.040106057990813, | 2.040106057990813,
12.705700651141097 | 12.705700651141098 | 12.705700651141097
], | ], | ],
[ | [ | [
2.872284605134971, | 2.872284605134975, | 2.9348120841290380,
-7.921074881945759 | -7.921074881945758 | -8.0104210391239121
], | ], | ],