mathurinm/celer

"Inner solver did not converge" problem

ksehic opened this issue · 4 comments

Hi,

I was using Celer Lasso model for Sparse-ho and weighted Lasso case. For some strange alphas, it would generate this warning "Inner solver did not converge" and it would just keep repeating itself to inf... It would be nice to have this procedure at least terminated when it happens... if it is difficult to solve it.

This is the code

import numpy as np
from sklearn.utils import check_random_state
from celer import Lasso
from scipy.linalg import toeplitz
from numpy.linalg import norm
from sklearn.model_selection import train_test_split

from celer import Lasso

from sparse_ho.models import WeightedLasso
from sparse_ho.criterion import HeldOutMSE

n_samples, n_features = 100, 50

rng = check_random_state(0)
X = rng.multivariate_normal(size=n_samples, mean=np.zeros(n_features),
                            cov=toeplitz(0.5 ** np.arange(n_features)))

w_true = np.zeros(n_features)
w_true[::3] = (-1) ** np.arange(n_features // 3 + 1)

noise = rng.randn(n_samples)
y = X @ w_true
y += noise / norm(noise) * 0.5 * norm(y)

X_train_val, X_test, y_train_val, y_test = train_test_split(
    X, y, test_size=0.15, random_state=42)

all_indices = np.arange(X_train_val.shape[0])
train_indices, val_indices = train_test_split(
    all_indices, test_size=0.5, random_state=42)

X_train = X_train_val[train_indices, :]
X_val = X_train_val[val_indices, :]

y_train = y_train_val[train_indices]
y_val = y_train_val[val_indices]

alpha_max = np.max(np.abs(X.T @ y)) / n_samples

estimator = Lasso(fit_intercept=False, max_iter=1e6,
                    warm_start=True, max_epochs=1e6)
model = WeightedLasso(estimator=estimator)
criterion = HeldOutMSE(train_indices, val_indices)

objective_x = lambda config_x: criterion.get_val(model, X, y, log_alpha=config_x, tol=1e-7)

alpha_up = np.log(alpha_max)
alpha_low = np.log(alpha_max/10**3)
dim_alpha = X_train.shape[1]

and these are two alphas when it happens

alpha_1.txt
alpha_2.txt

For the warning, you need to run objective_x for alphas_1 or _2

Thanks for the feedback @ksehic, if I run your snippet everything works fine because no Lasso is trained.

Do you think you can isolate an alpha, an X and a y such that celer.Lasso(alpha=alpha).fit(X, y) fails ? You can upload such X and y as .npy files, no need to know how they were generated to investigate.

Hi @mathurinm Sorry for not following. The two alpha I have provided are p-dimensional, while in general Lasso model in celer is 1D. But I can try your suggestion...

The problem that we have is that when we combine Lasso as estimator with sparse-ho and weighted Lasso. I have updated the script with sparse-ho functions.

I do not know how you made it run, because if I run objective_x for these two alpha "!!! Inner solver did not converge at epoch 999999, gap: 4.03e-04 > 1.95e-06" it just keeps running... Maybe this is in your case, I have noticed that if a Lasso model remembers a good previous estimation then this problem will not happen

The example

image

Thanks Kenan for the very valuable feedback. I think you have found a bug.
Here's a smaller snippet (I removed the dependency on sparse-ho to make sure that the problems comes from celer's side):

import numpy as np
from sklearn.utils import check_random_state
from celer import Lasso
from scipy.linalg import toeplitz
from numpy.linalg import norm
from sklearn.model_selection import train_test_split


n_samples, n_features = 100, 50

rng = check_random_state(0)
X = rng.multivariate_normal(size=n_samples, mean=np.zeros(n_features),
                            cov=toeplitz(0.5 ** np.arange(n_features)))

w_true = np.zeros(n_features)
w_true[::3] = (-1) ** np.arange(n_features // 3 + 1)

noise = rng.randn(n_samples)
y = X @ w_true
y += noise / norm(noise) * 0.5 * norm(y)

X_train_val, X_test, y_train_val, y_test = train_test_split(
    X, y, test_size=0.15, random_state=42)

all_indices = np.arange(X_train_val.shape[0])
train_indices, val_indices = train_test_split(
    all_indices, test_size=0.5, random_state=42)

X_train = X_train_val[train_indices, :]
y_train = y_train_val[train_indices]

weights = np.exp(np.genfromtxt("alpha_1.txt"))

clf = Lasso(fit_intercept=False, max_iter=20,
            warm_start=True, max_epochs=1e6, verbose=1, weights=weights).fit(X_train, y_train)

my guess is that a feature is wrongly screened (discarded from the problem as we are sure its coefficient is 0 at optimum), hence the solver misses this feature and cannot converge:

## -- End pasted text --
#########################
##### Computing alpha 1/1
#########################
Iter 0: primal 6.0335620031, gap 6.03e+00, 10 feats in subpb (50 left)
Iter 1: primal 4.2495449392, gap 4.23e+00, 20 feats in subpb (50 left)
Iter 2: primal 3.3997480696, gap 3.35e+00, 40 feats in subpb (50 left)
Iter 3: primal 0.8432055459, gap 7.96e-01, 50 feats in subpb (50 left)
Iter 4: primal 0.7385780183, gap 2.36e-01, 47 feats in subpb (47 left)
Iter 5: primal 0.7365933429, gap 5.83e-02, 47 feats in subpb (47 left)
Iter 6: primal 0.7362112038, gap 6.17e-04, 39 feats in subpb (39 left)
Iter 7: primal 0.7358465775, gap 2.53e-04, 33 feats in subpb (33 left)
!!! Inner solver did not converge at epoch 999999, gap: 1.95e-03 > 7.58e-05

Indeed disabling screening in the cython code (cython_utils.pyx)

        if prios[j] > radius:
            pass
            # screened[j] = True
            # n_screened[0] += 1

i get a converging solver

#########################
##### Computing alpha 1/1
#########################
Iter 0: primal 6.0335620031, gap 6.03e+00, 10 feats in subpb (50 left)
Iter 1: primal 4.2495449392, gap 4.23e+00, 20 feats in subpb (50 left)
Iter 2: primal 3.3997480696, gap 3.35e+00, 40 feats in subpb (50 left)
Iter 3: primal 0.8432055459, gap 7.96e-01, 50 feats in subpb (50 left)
Iter 4: primal 0.7385780183, gap 2.36e-01, 50 feats in subpb (50 left)
Iter 5: primal 0.7371412690, gap 8.48e-03, 50 feats in subpb (50 left)
Iter 6: primal 0.7365109799, gap 9.14e-04, 50 feats in subpb (50 left)
Iter 7: primal 0.7359437030, gap 3.47e-04, 50 feats in subpb (50 left)
Iter 8: primal 0.7356806570, gap 8.94e-06
Early exit, gap: 8.94e-06 < 1.00e-04

Great @mathurinm ! Thank you!