malb/lattice-estimator

CheNgu12 enumeration fit data

Opened this issue · 11 comments

In the source for CheNgu12 enumeration cost model it is mentioned that the formula for the exponent in the SVP cost was fitted to data from the Table 4 of the BKZ 2.0 paper [CheNgu12]. However inspection of [CheNgu12] shows that the numbers used for the fit are higher than those provided in the paper for linear pruning. It would instead seem to be the case that the numbers used for the fit are taken from Table 5.2 (page 119) of Yuanmi Chen's thesis.

This may be a documentation bug. It is also interesting that the extreme pruning numbers in the thesis are higher than the linear pruning ones in [CheNgu12], maybe a re-fit to those would be worth it?

For added context, I run the enumeration fit code through the CheNgu12 data points to obtain slightly different exponents:

sage: print("Yuanmi Chen 13 extreme") 
....: dim = [100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250] 
....: nodes = [39.0, 44.0, 49.0, 54.0, 60.0, 66.0, 72.0, 78.0, 84.0, 96.0, 99.0, 105.0, 111.0, 120.0, 127.0, 134.0]  # noqa 
....: times = [c + log(200,2).n() for c in nodes] 
....: T = list(zip(dim, nodes)) 
....: var("a,b,c,beta") 
....: f = a*beta*log(beta, 2.0) + b*beta + c 
....: f = f.function(beta) 
....: f.subs(find_fit(T, f, solution_dict=True)) 
....:  
....: print("CheNgu12 linear") 
....: dim = [100, 120, 140, 160, 180, 200, 220, 240, 280, 380] 
....: nodes = [33.6, 44.5, 56.1, 68.2, 80.7, 93.7, 107.0, 120.6, 148.8, 223.5] # noqa 
....: times = [c + log(200,2).n() for c in nodes] 
....: T = list(zip(dim, nodes)) 
....: var("a,b,c,beta") 
....: f = a*beta*log(beta, 2.0) + b*beta + c 
....: f = f.function(beta) 
....: f.subs(find_fit(T, f, solution_dict=True)) 
....:  
....: print("CheNgu12 extreme") 
....: dim = [100, 120, 140, 160, 180, 200, 220, 240, 280, 380] 
....: nodes = [9, 15, 21.7, 28.8, 36.4, 44.4, 52.8, 61.5, 79.8, 129.9] 
....: times = [c + log(200,2).n() for c in nodes] 
....: T = list(zip(dim, nodes)) 
....: var("a,b,c,beta") 
....: f = a*beta*log(beta, 2.0) + b*beta + c 
....: f = f.function(beta) 
....: f.subs(find_fit(T, f, solution_dict=True)) 

Output:

Yuanmi Chen 13 extreme
(a, b, c, beta)
beta |--> 0.270188776350190*beta*log(beta) - 1.0192050451318417*beta + 16.10253135200765
CheNgu12 linear
(a, b, c, beta)
beta |--> 0.181790170093418*beta*log(beta) - 0.4882442914735114*beta - 1.3146955691545792
CheNgu12 extreme
(a, b, c, beta)
beta |--> 0.182571394958456*beta*log(beta) - 0.7398311715129441*beta - 1.0833549677345786
malb commented

Okay, looks like we should update to Chen 13 extreme, right?

Yep, unless we believe the CheNgu12 params are better? They predict faster enumeration, not sure if independent experimental data agrees with it.

malb commented

Chen13 seems to match what we got for FPLLL: ~= 1/2e*beta*log(beta) - beta + 16

Chen13 seems to match what we got for FPLLL: ~= 1/2e*beta*log(beta) - beta + 16

But it's a super-exponential factor off, no? It's quite a big difference at, say, beta = 500.

sage: def chen13(beta):
....          # From Fernando's numbers
....          return 0.270188776350190*beta*log(beta,2.0) - 1.0192050451318417*beta + 16.10253135200765

sage: def fplll(beta):
....          # These numbers come from [ABFKSW20]
....          return 1/(2*e) * beta * log(beta, 2.0) - 0.995*beta + 16.25

sage: RR(fplll(500))
343.331....
sage:  RR(chen13(500))
717.727....

Or am I missing something?

malb commented

log(beta) vs log(beta, 2) :)

Yep, annoying mathematicians using natural logs everywhere

sage: chen13 = lambda beta:  0.270188776350190*beta*log(beta) - 1.0192050451318417*beta + 16.10253135200765 
....: fplll = lambda beta: 1/(2*e) * beta * log(beta, 2.0) - 0.995*beta + 16.25 
....: RR(fplll(500)) 
....: RR(chen13(500))                                                                                                                                                                                                                                        
343.331928076297
346.058687590423

Edit: out of scruple I checked the curve fit when keeping the decimal points in Chen's thesis (that are rounded away in the estimator's code), keeping them does not make a meaningful difference:

sage: chen13 = lambda beta:  0.270188776350190*beta*log(beta) - 1.0192050451318417*beta + 16.10253135200765  
....: chen13fp = lambda beta: 0.273273176432764*beta*log(beta) - 1.0400174907733366*beta + 16.915273240747165 
....: fplll = lambda beta: 1/(2*e) * beta * log(beta, 2.0) - 0.995*beta + 16.25  
....: RR(fplll(500))  
....: RR(chen13(500)) 
....: RR(chen13fp(500))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
343.331928076297
346.058687590423
346.049375524385

Annoyingly: should the class be renamed to Chen13, or maybe for backwards compatibility have a Chen13 class and set CheNgu12 as a copy of Chen13 to avoid someone's code breaking? I guess the lattice-estimator has still a moving API

malb commented

I vote for just renaming it, I don't want to guarantee the API (just yet)

Sounds fair, for enumeration people should be using newer models anyway I guess