Runtime of `estimate()` for SEAL parameter set
Opened this issue · 10 comments
When calling LWE.estimate(SEAL22_32768)
, where SEAL22_32768
is the largest SEAL parameter set given in schemes
, the program has a very long runtime - I've been running on 4 CPUs (set jobs=32
) for 4 hours, and I haven't seen any estimates, only the following error message:
Algorithm <estimator.lwe_bkw.CodedBKW object at ---> on LWEParameters(n=32768, q=4127301024497384737127654569660285988428494734657199391624693039270889863724412964643884811622321780427143710884821317803768340308614730759769835769241715444596770968742227220068214981847081570726751819595399909407406471037121576084674975771617472472542994965871984641, Xs=D(σ=0.82), Xe=D(σ=3.19), m=+Infinity, tag='SEAL22_32768') failed with cannot compute ceil(log(4127301024497384737127654569660285988428494734657199391624693039270889863724412964643884811622321780427143710884821317803768340308614730759769835769241715444596770968742227220068214981847081570726751819595399909407406471037121576084674975771617472472542994965871984641)/log(2)) using 256 bits of precision
For parameters this large, should only LWE.estimate.rough
be used?
Ideally, the estimate would complete in a reasonable amount of time, so I consider this a bug. It might be useful to put things on the deny list to figure out which algorithm is so slow?
To get it to finish in < 2 mins, I need to skip arora-gb
and all of the bdd*
: bkw
has the issue above
It makes sense to me that the hybrid attacks would be slow for large n
, but bdd
seems to behave weirdly as n
grows:
sage: %time LWE.primal_bdd(schemes.SEAL20_1024)
CPU times: user 3.83 s, sys: 12.8 ms, total: 3.84 s
Wall time: 3.84 s
rop: ≈2^71.3, red: ≈2^71.1, svp: ≈2^68.5, β: 137, η: 167, d: 1952, tag: bdd
sage: %time LWE.primal_bdd(schemes.SEAL20_2048)
CPU times: user 15.2 s, sys: 27 ms, total: 15.2 s
Wall time: 15.2 s
rop: ≈2^72.4, red: ≈2^72.2, svp: ≈2^69.9, β: 137, η: 172, d: 4030, tag: bdd
sage: %time LWE.primal_bdd(schemes.SEAL20_4096)
CPU times: user 1min 1s, sys: 235 ms, total: 1min 2s
Wall time: 1min 2s
rop: ≈2^72.1, red: ≈2^72.1, svp: ≈2^66.3, β: 133, η: 159, d: 7851, tag: bdd
primal_usvp
takes ~2ms for each of the above param sets and returns essentially the same estimate (as we might expect).
quadrupling with doubling n
? So runtime proportional to n**2
?
Plus the fact that it is much more expensive than primal_usvp
, since they should be doing roughly the same thing. I've started to look at bdd
over here, some small tweaks seem to drop the running time -- although stablity of resulting estimates needs a more in-depth look.
sage: %time LWE.primal_bdd(schemes.SEAL20_1024)
CPU times: user 2.02 s, sys: 12.4 ms, total: 2.03 s
Wall time: 2.03 s
rop: ≈2^71.2, red: ≈2^71.1, svp: ≈2^67.4, β: 137, η: 163, d: 2015, tag: bdd
sage: %time LWE.primal_bdd(schemes.SEAL20_2048)
CPU times: user 6.02 s, sys: 56.1 ms, total: 6.07 s
Wall time: 6.12 s
rop: ≈2^72.6, red: ≈2^72.4, svp: ≈2^69.6, β: 138, η: 171, d: 3858, tag: bdd
sage: %time LWE.primal_bdd(schemes.SEAL20_4096)
CPU times: user 22 s, sys: 58.7 ms, total: 22 s
Wall time: 22 s
rop: ≈2^72.2, red: ≈2^72.1, svp: ≈2^68.2, β: 133, η: 166, d: 7783, tag: bdd
The best solution might be to seperate bdd
and primal_hybrid
entirely.
It looks like the cost comes entirely from the calls to svp_dimension
in lwe_primal.py
.
If I run the following script (seal_20_4096.py
):
from estimator import *
LWE.primal_bdd(schemes.SEAL20_4096)
via cProfile:
sage --python -m cProfile --s tottime seal_20_4096.py &> seal_20_4096_stats
Gives me these results:
4453292 function calls (4411138 primitive calls) in 53.418 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
23 50.367 2.190 50.410 2.192 lwe_primal.py:257(svp_dimension)
31 0.685 0.022 0.731 0.024 simulator.py:101(<listcomp>)
....
So the vast majority of that 53 seconds comes from a few expensive calls to svp_dimension
.
My guess is that this is actually an fpylll problem: it's likely that the actual gaussian_heuristic
call is expensive, possibly due to the range of values.
For example, if I print out the values of r
inside svp_dimension
for these parameters then the first few seem quite large:
[9.67371330140938e114, 9.36982916809639e114, 9.07549106572299e114, 8.79039912109246e114,
8.51426288104086e114, 8.24680101652289e114, 7.98774103599343e114, 7.73681900778966e114,
7.49377929123748e114, 7.25837427620310e114, 7.03036413082657e114, 6.80951655717978e114,
6.59560655459990e114, 6.38841619045855e114, 6.18773437812932e114, 5.99335666193247e114,
5.80508500883504e114, 5.62272760669236e114, 5.44609866882982e114, 5.27501824476213e114,
5.10931203685889e114, 4.94881122276919e114, 4.79335228342441e114, 4.64277683644481e114,
... 10.0497632947441]
There's probably some work that can be reduced here: for example, fpylll's gaussian_heuristic
recomputes the log norm of each entry each time the function is called, but here that isn't needed since r
doesn't change. Depending on how much of the time cost comes from the logs vs the exponentiation, maybe using a different (batched) gaussian heuristic computation is the way to go. cProfile doesn't give us this information, so maybe just trying it is the best way to go.
@malb Do you have an opinion on if an amortized gaussian heuristic function should live inside fpylll or the estimator? I'm thinking of a function that essentially takes this loop fragment:
lattice-estimator/estimator/lwe_primal.py
Lines 265 to 267 in e394b8a
and computes certain repeated costs (e.g the logs of the r
values) once, rather than on each call. It seems like something fpylll should have, but that then ties users of the estimator to a particular version of fpylll.
To me this "feels" rather estimator-y, no? Why do you think it might live in FPyLLL?
Because it's also pretty fpylll-y, no?
We'd need to include the logic for computing the Gaussian Heuristic in the estimator. IIRC, the Gaussian Heuristic computation in fpylll leans on a few internal constants (e.g the log volume of a ball) to make things faster.
I'm not sure if you'd want to reproduce that in the estimator.
To me this fits quite naturally here, I'd just "open up" gaussian_heuristic
and do the appropriate caching "by hand", but I'm not married to that