RecursiveFeatureElimination Error using Ccross-validation Splitter
Opened this issue · 4 comments
Describe the bug
When I try to use a cross-validation splitter instead of an integer value in RecursiveFeatureElimination
I face an error that says IndexError: list index out of range.
I used the the same splitter in the cross-validation function in sklearn and it worked properly. So I wonder why this happens? and if it has any solution or not?
To Reproduce
Steps to reproduce the behavior:
group_cvs = LeaveOneGroupOut()
cvsp=group_cvs.split(X, y, groups)
model = RandomForestRegressor(random_state=123
sel.fit(X,y))
Expected behavior
IndexError Traceback (most recent call last)
Cell In[102], line 1
----> 1 sel.fit(X,y)
File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/feature_engine/selection/recursive_feature_elimination.py:179, in RecursiveFeatureElimination.fit(self, X, y)
176 break
178 # remove feature and train new model
--> 179 model_tmp = cross_validate(
180 self.estimator,
181 X_tmp.drop(columns=feature),
182 y,
183 cv=self.cv,
184 scoring=self.scoring,
185 return_estimator=False,
186 )
188 # assign new model performance
189 model_tmp_performance = model_tmp["test_score"].mean()
File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/sklearn/utils/_param_validation.py:214, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
208 try:
209 with config_context(
210 skip_parameter_validation=(
211 prefer_skip_nested_validation or global_skip_validation
212 )
213 ):
--> 214 return func(*args, **kwargs)
215 except InvalidParameterError as e:
216 # When the function is just a wrapper around an estimator, we allow
217 # the function to delegate validation to the estimator, but we replace
218 # the name of the estimator by the name of the function in the error
219 # message to avoid confusion.
220 msg = re.sub(
221 r"parameter of \w+ must be",
222 f"parameter of {func.__qualname__} must be",
223 str(e),
224 )
File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:336, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, return_indices, error_score)
333 if callable(scoring):
334 _insert_error_scores(results, error_score)
--> 336 results = _aggregate_score_dicts(results)
338 ret = {}
339 ret["fit_time"] = results["fit_time"]
File ~/Documents/Learning/Air_Polltion_Map_Milan/myenv/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:2038, in _aggregate_score_dicts(scores)
2009 def _aggregate_score_dicts(scores):
2010 """Aggregate the list of dict to dict of np ndarray
2011
2012 The aggregated output of _aggregate_score_dicts will be a list of dict
(...)
2030 'b': array([10, 2, 3, 10])}
2031 """
2032 return {
2033 key: (
2034 np.asarray([score[key] for score in scores])
2035 if isinstance(scores[0][key], numbers.Number)
2036 else [score[key] for score in scores]
2037 )
-> 2038 for key in scores[0]
2039 }
IndexError: list index out of range
Desktop (please complete the following information):
- OS: macOS
Hey @behzad89
Thanks for raising the issue and sorry you are facing problems.
You mention you use that function with sklearn and it works, but with our selector it does not. Could you add more code to reproduce both? Do you have a notebook you could share? or copy and paste the code that works and the one that does not?
It will help us a lot to fix this faster.
Thank you!
Hi @solegalli
Thanks for your response - I can create a dummy notebook to reproduce the error. I will create and share it here. Just give me half a day
Hi @solegalli
As promised, I generated a dummy notebook to check the error. The ZIP file includes the Jupyter Notebook.
Let me know if I can give you hand in this regard