malb/lattice-estimator

Runtime of `estimate()` for SEAL parameter set

Opened this issue · 10 comments

TabOg commented

When calling LWE.estimate(SEAL22_32768), where SEAL22_32768 is the largest SEAL parameter set given in schemes, the program has a very long runtime - I've been running on 4 CPUs (set jobs=32) for 4 hours, and I haven't seen any estimates, only the following error message:
Algorithm <estimator.lwe_bkw.CodedBKW object at ---> on LWEParameters(n=32768, q=4127301024497384737127654569660285988428494734657199391624693039270889863724412964643884811622321780427143710884821317803768340308614730759769835769241715444596770968742227220068214981847081570726751819595399909407406471037121576084674975771617472472542994965871984641, Xs=D(σ=0.82), Xe=D(σ=3.19), m=+Infinity, tag='SEAL22_32768') failed with cannot compute ceil(log(4127301024497384737127654569660285988428494734657199391624693039270889863724412964643884811622321780427143710884821317803768340308614730759769835769241715444596770968742227220068214981847081570726751819595399909407406471037121576084674975771617472472542994965871984641)/log(2)) using 256 bits of precision
For parameters this large, should only LWE.estimate.rough be used?

malb commented

Ideally, the estimate would complete in a reasonable amount of time, so I consider this a bug. It might be useful to put things on the deny list to figure out which algorithm is so slow?

TabOg commented

To get it to finish in < 2 mins, I need to skip arora-gb and all of the bdd*: bkw has the issue above

It makes sense to me that the hybrid attacks would be slow for large n, but bdd seems to behave weirdly as n grows:

sage: %time LWE.primal_bdd(schemes.SEAL20_1024)
CPU times: user 3.83 s, sys: 12.8 ms, total: 3.84 s
Wall time: 3.84 s
rop: ≈2^71.3, red: ≈2^71.1, svp: ≈2^68.5, β: 137, η: 167, d: 1952, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_2048)
CPU times: user 15.2 s, sys: 27 ms, total: 15.2 s
Wall time: 15.2 s
rop: ≈2^72.4, red: ≈2^72.2, svp: ≈2^69.9, β: 137, η: 172, d: 4030, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_4096)
CPU times: user 1min 1s, sys: 235 ms, total: 1min 2s
Wall time: 1min 2s
rop: ≈2^72.1, red: ≈2^72.1, svp: ≈2^66.3, β: 133, η: 159, d: 7851, tag: bdd

primal_usvp takes ~2ms for each of the above param sets and returns essentially the same estimate (as we might expect).

TabOg commented

quadrupling with doubling n? So runtime proportional to n**2?

Plus the fact that it is much more expensive than primal_usvp, since they should be doing roughly the same thing. I've started to look at bdd over here, some small tweaks seem to drop the running time -- although stablity of resulting estimates needs a more in-depth look.

sage: %time LWE.primal_bdd(schemes.SEAL20_1024)
CPU times: user 2.02 s, sys: 12.4 ms, total: 2.03 s
Wall time: 2.03 s
rop: ≈2^71.2, red: ≈2^71.1, svp: ≈2^67.4, β: 137, η: 163, d: 2015, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_2048)
CPU times: user 6.02 s, sys: 56.1 ms, total: 6.07 s
Wall time: 6.12 s
rop: ≈2^72.6, red: ≈2^72.4, svp: ≈2^69.6, β: 138, η: 171, d: 3858, tag: bdd

sage: %time LWE.primal_bdd(schemes.SEAL20_4096)
CPU times: user 22 s, sys: 58.7 ms, total: 22 s
Wall time: 22 s
rop: ≈2^72.2, red: ≈2^72.1, svp: ≈2^68.2, β: 133, η: 166, d: 7783, tag: bdd

The best solution might be to seperate bdd and primal_hybrid entirely.

It looks like the cost comes entirely from the calls to svp_dimension in lwe_primal.py.

If I run the following script (seal_20_4096.py):

from estimator import *
LWE.primal_bdd(schemes.SEAL20_4096)

via cProfile:

sage --python -m cProfile --s tottime seal_20_4096.py &> seal_20_4096_stats

Gives me these results:

         4453292 function calls (4411138 primitive calls) in 53.418 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       23   50.367    2.190   50.410    2.192 lwe_primal.py:257(svp_dimension)
       31    0.685    0.022    0.731    0.024 simulator.py:101(<listcomp>)
       ....

So the vast majority of that 53 seconds comes from a few expensive calls to svp_dimension.

My guess is that this is actually an fpylll problem: it's likely that the actual gaussian_heuristic call is expensive, possibly due to the range of values.

For example, if I print out the values of r inside svp_dimension for these parameters then the first few seem quite large:

[9.67371330140938e114, 9.36982916809639e114, 9.07549106572299e114, 8.79039912109246e114, 
8.51426288104086e114, 8.24680101652289e114, 7.98774103599343e114, 7.73681900778966e114, 
7.49377929123748e114, 7.25837427620310e114, 7.03036413082657e114, 6.80951655717978e114,
 6.59560655459990e114, 6.38841619045855e114, 6.18773437812932e114, 5.99335666193247e114,
 5.80508500883504e114, 5.62272760669236e114, 5.44609866882982e114, 5.27501824476213e114,
 5.10931203685889e114, 4.94881122276919e114, 4.79335228342441e114, 4.64277683644481e114, 
... 10.0497632947441]

There's probably some work that can be reduced here: for example, fpylll's gaussian_heuristic recomputes the log norm of each entry each time the function is called, but here that isn't needed since r doesn't change. Depending on how much of the time cost comes from the logs vs the exponentiation, maybe using a different (batched) gaussian heuristic computation is the way to go. cProfile doesn't give us this information, so maybe just trying it is the best way to go.

@malb Do you have an opinion on if an amortized gaussian heuristic function should live inside fpylll or the estimator? I'm thinking of a function that essentially takes this loop fragment:

d = len(r)
for i, _ in enumerate(r):
if gaussian_heuristic(r[i:]) < D.stddev**2 * (d - i):

and computes certain repeated costs (e.g the logs of the r values) once, rather than on each call. It seems like something fpylll should have, but that then ties users of the estimator to a particular version of fpylll.

malb commented

To me this "feels" rather estimator-y, no? Why do you think it might live in FPyLLL?

Because it's also pretty fpylll-y, no?

We'd need to include the logic for computing the Gaussian Heuristic in the estimator. IIRC, the Gaussian Heuristic computation in fpylll leans on a few internal constants (e.g the log volume of a ball) to make things faster.

I'm not sure if you'd want to reproduce that in the estimator.

malb commented

To me this fits quite naturally here, I'd just "open up" gaussian_heuristic and do the appropriate caching "by hand", but I'm not married to that