Unbiased random prime generation
fjarri opened this issue · 5 comments
Currently random prime generation starts from a random number and runs a sieve until a prime is found. This can introduce bias, selecting primes with large leads more often. Some assorted considerations:
- How dangerous is it, actually? Any sources?
- GMP re-samples the start position every 0x10000 samples, quoting "deep science" (with no further explanations). OpenSSL doesn't re-sample.
- Just sampling random numbers each time and running BPSW on them makes the generation several times slower.
- To avoid touching the RNG many times, we can start with a random
a < 2^(k-1)
, and generate candidates as2^(k-1) + (a + i * b mod 2^(k-1))
whereb
is a random odd number. This will uniformly cover all the range[2^(k-1), 2^k)
(right?) May be a little faster but not too much. - See https://eprint.iacr.org/2011/481.pdf ("Close to Uniform Prime Number Generation With Fewer Random Bits") for a more advanced algorithm.
Some investigation results of the algorithm from the paper above (Fig. 2). The steps 1-7 (finding b
that's coprime to a product of small primes m
) are quite fast, usually b
is found in 1-2 iterations. But since m
is combined out of a smaller number of primes we use in the sieve, the iteration in the steps 8-11 produces more composites compared to Sieve
. For example, for U1024
, and \ell = 128
, m
is composed out of ~120 primes, while Sieve
uses 2047 primes. This makes finding a prime about two times slower.
So it doesn't seem like we can just replace the existing sieve, but the method does have value when an unbiased generation is required. Notes:
- How do we select
b
if we're looking for a safe prime? Can we adjust the iteration so that the resulting2b + 1
(orb/2
) is also coprime tom
? Otherwise the difference with regular sieving will be huge. - We will have to pre-compute
m
,lambda(m)
and so on for eachUint
size we plan to use. Probably put it in a trait. - Note that since we're using Montgomery representation to find
b
,m
must be odd, sob
may be even. In the steps 8-11 we can adjust the generateda
so thata * m + b
is always odd (or= 3 mod 4
for safe primes). - It would be useful to have a "random nonzero
DynResidue
" method, so that we don't have to convert the generated randoms into Montgomery in the step 4.
I can note my explicit interest in this for use in a hash-to-prime function. I'd also be curious to benchmark it when it's not using a distinctly sampled random numbers, yet only generating a single new byte/chunk and shifting over the generated bits (questioning how much of the cost is due to generating KB-MB of random numbers).
I didn't measure it, but it feels like the contribution of RNG is quite minimal.