[BUG] Confound removal with newer sklearn versions results in too many user warnings
Closed this issue · 3 comments
Describe the bug
A clear and concise description of what the bug is. Include the error message in detail.
A new version of scikit-learn instoduced a check for feature names. With this new version, any julearn model with confound removal will issue too many warnings like this:
/Users/fraimondo/anaconda3/envs/julearn/lib/python3.8/site-packages/sklearn/base.py:443: UserWarning: X has feature names, but LinearRegression was fitted without feature names
warnings.warn(
To Reproduce
Steps to reproduce the behavior:
"""
Return Confounds in Confound Removal
====================================
In most cases confound removal is a simple operation.
You regress out the confound from the features and only continue working with
these new confound removed features. This is also the default setting for
julearn's `remove_confound` step. But sometimes you want to work with the
confound even after removing it from the features. In this example, we
will discuss the options you have.
"""
# Authors: Sami Hamdan <s.hamdan@fz-juelich.de>
#
# License: AGPL
from sklearn.datasets import load_diabetes # to load data
from julearn.transformers import ChangeColumnTypes
from julearn import run_cross_validation
import warnings
# load in the data
df_features, target = load_diabetes(return_X_y=True, as_frame=True)
###############################################################################
# First, we can have a look at our features.
# You can see it includes
# Age, BMI, average blood pressure (bp) and 6 other measures from s1 to s6
# Furthermore, it includes sex which will be considered as a confound in
# this example.
#
print('Features: ', df_features.head())
###############################################################################
# Second, we can have a look at the target
print('Target: ', target.describe())
###############################################################################
# Now, we can put both into one DataFrame:
data = df_features.copy()
data['target'] = target
###############################################################################
# In the following we will explore different settings of confound removal
# using Julearns pipeline functionalities.
#
# Confound Removal Typical Use Case
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here, we want to deconfound the features and not include the confound as a
# feature into our last model.
# Afterwards, we will transform our features with a pca and run
# a linear regression.
#
feature_names = list(df_features.drop(columns='sex').columns)
scores, model = run_cross_validation(
X=feature_names, y='target', data=data,
confounds='sex', model='linreg', problem_type='regression',
preprocess_X=['remove_confound', 'pca'],
return_estimator='final')
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
System (please complete the following information):
- OS: [e.g. macOS / Linux / Windows]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.
Workaround for the moment:
with warnings.catch_warnings():
warnings.simplefilter("once", lineno=443)
scores, model = run_cross_validation(
X=feature_names, y='target', data=data,
confounds='sex', model='linreg', problem_type='regression',
preprocess_X=['remove_confound', 'pca'],
return_estimator='final')
Solution to use when joblib is used:
import sys
if not sys.warnoptions:
import os, warnings
warnings.simplefilter("ignore") # Change the filter in this process
os.environ["PYTHONWARNINGS"] = "ignore" # Also affect subprocesses
@samihamdan Is this fixed for the moment? Will it be fixed for 0.3.0?