juaml/julearn

Error while using column names with '+' eg: 'sepal+length'

Closed this issue · 1 comments

Describe the bug
Error in run_cross_validation while using column names with '+' eg: 'sepal+length'

To Reproduce

This code will reproduce the error

import pandas as pd
import numpy as np
from seaborn import load_dataset
from julearn import run_cross_validation
from julearn.utils import configure_logging

df_iris = load_dataset('iris')
df_iris = df_iris.rename(columns={'sepal_length': 'sepal+length'}) # replaced underscore with '+'

X = ['sepal+length', 'sepal_width', 'petal_length']
y = 'species'

scores, model_iris = run_cross_validation(X=X, y=y, data=df_iris, model='svm', preprocess_X='zscore', problem_type='multiclass_classification', scoring=['accuracy'], return_estimator='final')

Expected behavior
It should have run without error

Screenshots
If applicable, add screenshots to help explain your problem.
image

System (please complete the following information):

  • OS: macOS and Linux
  • Version [e.g. 22] julearn: 0.2.5.dev

Additional context
Add any other context about the problem here.

Using 'sepal+length' as a column name will be considered as a regular expression. You need to escape the + in order to use the literal element in a regular expression.

Your X should be like this:
X = ['sepal\+length', 'sepal_width', 'petal_length']