feature-engine/feature_engine

RecursiveFeatureElimination Error using Ccross-validation Splitter

Opened this issue · 4 comments

Describe the bug
When I try to use a cross-validation splitter instead of an integer value in RecursiveFeatureElimination I face an error that says IndexError: list index out of range. I used the the same splitter in the cross-validation function in sklearn and it worked properly. So I wonder why this happens? and if it has any solution or not?

To Reproduce

Steps to reproduce the behavior:
group_cvs =  LeaveOneGroupOut()
cvsp=group_cvs.split(X, y, groups)
model = RandomForestRegressor(random_state=123
sel.fit(X,y))

Expected behavior

IndexError                                Traceback (most recent call last)
Cell In[102], line 1
----> 1 sel.fit(X,y)

File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/feature_engine/selection/recursive_feature_elimination.py:179, in RecursiveFeatureElimination.fit(self, X, y)
    176     break
    178 # remove feature and train new model
--> 179 model_tmp = cross_validate(
    180     self.estimator,
    181     X_tmp.drop(columns=feature),
    182     y,
    183     cv=self.cv,
    184     scoring=self.scoring,
    185     return_estimator=False,
    186 )
    188 # assign new model performance
    189 model_tmp_performance = model_tmp["test_score"].mean()

File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/sklearn/utils/_param_validation.py:214, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    208 try:
    209     with config_context(
    210         skip_parameter_validation=(
    211             prefer_skip_nested_validation or global_skip_validation
    212         )
    213     ):
--> 214         return func(*args, **kwargs)
    215 except InvalidParameterError as e:
    216     # When the function is just a wrapper around an estimator, we allow
    217     # the function to delegate validation to the estimator, but we replace
    218     # the name of the estimator by the name of the function in the error
    219     # message to avoid confusion.
    220     msg = re.sub(
    221         r"parameter of \w+ must be",
    222         f"parameter of {func.__qualname__} must be",
    223         str(e),
    224     )

File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:336, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, return_indices, error_score)
    333 if callable(scoring):
    334     _insert_error_scores(results, error_score)
--> 336 results = _aggregate_score_dicts(results)
    338 ret = {}
    339 ret["fit_time"] = results["fit_time"]

File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:2038, in _aggregate_score_dicts(scores)
   2009 def _aggregate_score_dicts(scores):
   2010     """Aggregate the list of dict to dict of np ndarray
   2011 
   2012     The aggregated output of _aggregate_score_dicts will be a list of dict
   (...)
   2030      'b': array([10, 2, 3, 10])}
   2031     """
   2032     return {
   2033         key: (
   2034             np.asarray([score[key] for score in scores])
   2035             if isinstance(scores[0][key], numbers.Number)
   2036             else [score[key] for score in scores]
   2037         )
-> 2038         for key in scores[0]
   2039     }

IndexError: list index out of range

Desktop (please complete the following information):

  • OS: macOS

Hey @behzad89

Thanks for raising the issue and sorry you are facing problems.

You mention you use that function with sklearn and it works, but with our selector it does not. Could you add more code to reproduce both? Do you have a notebook you could share? or copy and paste the code that works and the one that does not?

It will help us a lot to fix this faster.

Thank you!

Hi @solegalli

Thanks for your response - I can create a dummy notebook to reproduce the error. I will create and share it here. Just give me half a day

Awesome, thank you @behzad89

Hi @solegalli

As promised, I generated a dummy notebook to check the error. The ZIP file includes the Jupyter Notebook.

Let me know if I can give you hand in this regard

sample_code.zip