entropyxyz/crypto-primes

Unbiased random prime generation

fjarri opened this issue · 5 comments

Currently random prime generation starts from a random number and runs a sieve until a prime is found. This can introduce bias, selecting primes with large leads more often. Some assorted considerations:

  • How dangerous is it, actually? Any sources?
  • GMP re-samples the start position every 0x10000 samples, quoting "deep science" (with no further explanations). OpenSSL doesn't re-sample.
  • Just sampling random numbers each time and running BPSW on them makes the generation several times slower.
  • To avoid touching the RNG many times, we can start with a random a < 2^(k-1), and generate candidates as 2^(k-1) + (a + i * b mod 2^(k-1)) where b is a random odd number. This will uniformly cover all the range [2^(k-1), 2^k) (right?) May be a little faster but not too much.
  • See https://eprint.iacr.org/2011/481.pdf ("Close to Uniform Prime Number Generation With Fewer Random Bits") for a more advanced algorithm.

Some investigation results of the algorithm from the paper above (Fig. 2). The steps 1-7 (finding b that's coprime to a product of small primes m) are quite fast, usually b is found in 1-2 iterations. But since m is combined out of a smaller number of primes we use in the sieve, the iteration in the steps 8-11 produces more composites compared to Sieve. For example, for U1024, and \ell = 128, m is composed out of ~120 primes, while Sieve uses 2047 primes. This makes finding a prime about two times slower.

So it doesn't seem like we can just replace the existing sieve, but the method does have value when an unbiased generation is required. Notes:

  • How do we select b if we're looking for a safe prime? Can we adjust the iteration so that the resulting 2b + 1 (or b/2) is also coprime to m? Otherwise the difference with regular sieving will be huge.
  • We will have to pre-compute m, lambda(m) and so on for each Uint size we plan to use. Probably put it in a trait.
  • Note that since we're using Montgomery representation to find b, m must be odd, so b may be even. In the steps 8-11 we can adjust the generated a so that a * m + b is always odd (or = 3 mod 4 for safe primes).
  • It would be useful to have a "random nonzero DynResidue" method, so that we don't have to convert the generated randoms into Montgomery in the step 4.

I can note my explicit interest in this for use in a hash-to-prime function. I'd also be curious to benchmark it when it's not using a distinctly sampled random numbers, yet only generating a single new byte/chunk and shifting over the generated bits (questioning how much of the cost is due to generating KB-MB of random numbers).

fjarri commented

I didn't measure it, but it feels like the contribution of RNG is quite minimal.