mathurinm/celer

ENH: slow solver on large scale problems with majority of features screened

mathurinm opened this issue · 1 comments

Finance:

import time
import libsvmdata
import numpy as np
from numpy.linalg import norm
from celer import Lasso 

X, y = libsvmdata.fetch_libsvm("finance", min_nnz=3)
alpha_max = norm(X.T @ y, ord=np.inf) / len(y)

t0 = time.time()
clf = Lasso(alpha=alpha_max/20, fit_intercept=False, verbose=True).fit(X, y)
dur = time.time() - t0
print(f"{dur:.2f} seconds")

The first feature is super correlated with y, the support is small. Lots of features are screened, the convergence should be way faster for later iterations, and it is not.

In [19]: t0 = time.time(); clf = Lasso(alpha=alpha_max/20, fit_intercept=False, verbose=True).fit(X, y); dur = time.time() - t0
#########################
##### Computing alpha 1/1
#########################
Iter 0: primal 6.3741726822, gap 5.75e+00, 10 feats in subpb (9089 left)
Iter 1: primal 0.8647719451, gap 7.29e-02, 4 feats in subpb (162 left)
Iter 2: primal 0.8227823469, gap 1.96e-02, 6 feats in subpb (53 left)
Iter 3: primal 0.8144988993, gap 5.66e-03, 4 feats in subpb (14 left)
Iter 4: primal 0.8132372683, gap 1.63e-03, 4 feats in subpb (8 left)
Iter 5: primal 0.8130029142, gap 4.61e-04, 4 feats in subpb (6 left)
Iter 6: primal 0.8129717566, gap 1.35e-04, 3 feats in subpb (3 left)
Iter 7: primal 0.8129684005, gap 3.84e-05
Early exit, gap: 3.84e-05 < 1.00e-04

@QB3 related to our work