juaml/julearn

[BUG] Confound removal with newer sklearn versions results in too many user warnings

Closed this issue · 3 comments

Describe the bug
A clear and concise description of what the bug is. Include the error message in detail.

A new version of scikit-learn instoduced a check for feature names. With this new version, any julearn model with confound removal will issue too many warnings like this:

/Users/fraimondo/anaconda3/envs/julearn/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but LinearRegression was fitted without feature names
  warnings.warn(

To Reproduce
Steps to reproduce the behavior:

"""
Return Confounds in Confound Removal
====================================

In most cases confound removal is a simple operation.
You regress out the confound from the features and only continue working with
these new confound removed features. This is also the default setting for
julearn's `remove_confound` step. But sometimes you want to work with the
confound even after removing it from the features. In this example, we
will discuss the options you have.

"""
# Authors: Sami Hamdan <s.hamdan@fz-juelich.de>
#
# License: AGPL
from sklearn.datasets import load_diabetes  # to load data
from julearn.transformers import ChangeColumnTypes
from julearn import run_cross_validation
import warnings

# load in the data
df_features, target = load_diabetes(return_X_y=True, as_frame=True)


###############################################################################
# First, we can have a look at our features.
# You can see it includes
# Age, BMI, average blood pressure (bp) and 6 other measures from s1 to s6
# Furthermore, it includes sex which will be considered as a confound in
# this example.
#
print('Features: ', df_features.head())

###############################################################################
# Second, we can have a look at the target
print('Target: ', target.describe())

###############################################################################
# Now, we can put both into one DataFrame:
data = df_features.copy()
data['target'] = target

###############################################################################
# In the following we will explore different settings of confound removal
# using Julearns pipeline functionalities.
#
# Confound Removal Typical Use Case
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here, we want to deconfound the features and not include the confound as a
# feature into our last model.
# Afterwards, we will transform our features with a pca and run
# a linear regression.
#
feature_names = list(df_features.drop(columns='sex').columns)

scores, model = run_cross_validation(
    X=feature_names, y='target', data=data,
    confounds='sex', model='linreg', problem_type='regression',
    preprocess_X=['remove_confound', 'pca'],
    return_estimator='final')

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

  • OS: [e.g. macOS / Linux / Windows]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Workaround for the moment:

with warnings.catch_warnings():
    warnings.simplefilter("once", lineno=443)
    scores, model = run_cross_validation(
        X=feature_names, y='target', data=data,
        confounds='sex', model='linreg', problem_type='regression',
        preprocess_X=['remove_confound', 'pca'],
        return_estimator='final')

Solution to use when joblib is used:

import sys

if not sys.warnoptions:
    import os, warnings
    warnings.simplefilter("ignore") # Change the filter in this process
    os.environ["PYTHONWARNINGS"] = "ignore" # Also affect subprocesses

@samihamdan Is this fixed for the moment? Will it be fixed for 0.3.0?

solved in #154 and #183