Allow fitting an intercept in `Lasso.path()`

Question

Allow fitting an intercept in `Lasso.path()`

jolars opened this issue 2 years ago · 3 comments

It would be nice if Lasso.path() (celer_path()) could fit an intercept.

Answer 1 · 2022-04-19T07:22:07.000Z

Hi @jolars ! The design of celer_path follows that of scikit-learn, in order to keep our Lasso class as simple as possible (we do not overwrite methods other than __init__ and path).

scikit-learn computes the intercept in the fit method, by centering X and y : https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/linear_model/_coordinate_descent.py#L1008
then the intercept is computed after the coefficients have been computed : https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/linear_model/_coordinate_descent.py#L1084

So I guess it would not be too complicated to wrap celer_path() in the same fashion

Answer 2 · 2022-05-14T16:58:51.000Z

Indeed, it won't be difficult to implement such a feature in celer_path. Yet, since we expose the same API as scikit-learn, we better not do it so that to abide by their convention.

With that being said, here is a function that implements your requested feature

def custom_celer_path(X, y, **kwargs):
    X_cols_mean = X.mean(axis=0)
    y_mean = y.mean()

    X_cented = X - X_cols_mean
    y_cented = y - y_mean

    _, coefs, _ = celer_path(X_cented, y_cented, pb='lasso', return_n_iter=False,
                             return_thetas=False, **kwargs)
    intercepts = y_mean - coefs.T @ X_cols_mean

    return intercepts, coefs

One can show that solving a Lasso with intercept is the same as solving a problem with centered y and X (with regards to features) with the following link equation

\beta_0 = y_mean - \beta^T X_{centered}

Here is a little snippet to check/play with the function:

import numpy as np
from numpy.linalg import norm
from celer import celer_path, Lasso
from celer.datasets import make_correlated_data

# fix random state for reproducibility
random_state = 123
rnd = np.random.RandomState(random_state)

n_samples, n_features = 1000, 100
X, y, _ = make_correlated_data(n_samples, n_features, random_state=random_state)

# discenter X, y to make intercept != 0
for j in rnd.choice(n_features, 10):
    X[:, j] += rnd.uniform(5, 10)
y += rnd.uniform(5, 10)


tol = 1e-14
eps, n_alphas = 1e-3, 100
alpha_max = norm(X.T@y, ord=np.inf) / n_samples
alphas = alpha_max * np.geomspace(1, eps, n_alphas)

# fit with celer_path
intercepts, coefs = custom_celer_path(X, y, alphas=alphas, tol=tol)

# fit alphas individually with Lasso
ind_intercepts = []
for i, alpha in enumerate(alphas):
    model = Lasso(alpha=alpha, fit_intercept=True, tol=tol)
    model.fit(X, y)

    ind_intercepts.append(model.intercept_)

np.testing.assert_allclose(intercepts, ind_intercepts, atol=1e-7)

Note that the custom_celer_path is faster than fitting individually the alphas with Lasso since custom_celer_path solves sequentially the problems along the given path.

Answer 3 · 2022-05-14T17:03:41.000Z

Thanks @Badr-MOUFAD ! For sparse datasets we can't center X as it breaks sparsity but the same kind of technique works. It has been done in benchopt/benchmark_lasso_path#3 for example