dmlc/xgboost

Incompability between scikit-learn and xgboost

Opened this issue ยท 5 comments

I have xgboost 2.1.3 and scikit-learn 1.6.0.
After running this code
grid_search = GridSearchCV(XGBClassifier(objective='binary:logistic'), param_grid, scoring='accuracy', cv=5, verbose=1)
grid_search.fit(X_train, y_train)

I got following error


AttributeError Traceback (most recent call last)

Cell In[103], line 6
5 grid_search = GridSearchCV(XGBClassifier(objective='binary:logistic'), param_grid, scoring='accuracy', cv=5, verbose=1)

----> 6 grid_search.fit(X_train, y_train)

File ~/lab3/lib/python3.11/site-packages/sklearn/base.py:1389, in _fit_context..decorator..wrapper(estimator, *args, **kwargs)
1382 estimator._validate_params()
1384 with config_context(
1385 skip_parameter_validation=(
1386 prefer_skip_nested_validation or global_skip_validation
1387 )
1388 ):
-> 1389 return fit_method(estimator, *args, **kwargs)

File ~/lab3/lib/python3.11/site-packages/sklearn/model_selection/_search.py:932, in BaseSearchCV.fit(self, X, y, **params)
928 params = _check_method_params(X, params=params)
930 routed_params = self._get_routed_params_for_fit(params)
--> 932 cv_orig = check_cv(self.cv, y, classifier=is_classifier(estimator))
933 n_splits = cv_orig.get_n_splits(X, y, **routed_params.splitter.split)
935 base_estimator = clone(self.estimator)

File ~/lab3/lib/python3.11/site-packages/sklearn/base.py:1237, in is_classifier(estimator)
1230 warnings.warn(
1231 f"passing a class to {print(inspect.stack()[0][3])} is deprecated and "
1232 "will be removed in 1.8. Use an instance of the class instead.",
1233 FutureWarning,
1234 )
1235 return getattr(estimator, "_estimator_type", None) == "classifier"
-> 1237 return get_tags(estimator).estimator_type == "classifier"

File ~/lab3/lib/python3.11/site-packages/sklearn/utils/_tags.py:405, in get_tags(estimator)
403 for klass in reversed(type(estimator).mro()):
404 if "sklearn_tags" in vars(klass):
--> 405 sklearn_tags_provider[klass] = klass.sklearn_tags(estimator) # type: ignore[attr-defined]
406 class_order.append(klass)
407 elif "_more_tags" in vars(klass):

File ~/lab3/lib/python3.11/site-packages/sklearn/base.py:540, in ClassifierMixin.sklearn_tags(self)
539 def sklearn_tags(self):
--> 540 tags = super().sklearn_tags()
541 tags.estimator_type = "classifier"
542 tags.classifier_tags = ClassifierTags()

AttributeError: 'super' object has no attribute 'sklearn_tags'

I have the same error when I finished to fit a model and then try to print it in a jupyter notebook, as well as when I try to load a model

@piotrjacak Try using scikit learn version 1.5.0, does it solve your issue?

Thank you, it helped. As I tried to figure this out I found another solution. I wrapped XGBClassifier into a class, using sklearn BaseEstimator and ClassifierMixin. Then I passed instance of this class to GridSearchCV. I used following code:

from sklearn.base import BaseEstimator, ClassifierMixin

class SklearnXGBClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, **kwargs):
        self.model = XGBClassifier(**kwargs)

    def fit(self, X, y, **kwargs):
        self.model.fit(X, y, **kwargs)
        return self

    def predict(self, X):
        return self.model.predict(X)

    def predict_proba(self, X):
        return self.model.predict_proba(X)

    def get_params(self, deep=True):
        return self.model.get_params(deep)

    def set_params(self, **params):
        self.model.set_params(**params)
        return self

xgb = SklearnXGBClassifier(objective='binary:logistic')
grid_search = GridSearchCV(xgb, param_grid, scoring='accuracy', cv=5, verbose=1)

We experienced the same issue. I think it's related to some changes in release of scikit-learn 1.6.0., see scikit-learn/scikit-learn#30122 and recent release notes https://scikit-learn.org/stable/whats_new/v1.6.html#sklearn-base

see Issue DoubleML/doubleml-for-py#278 for DoubleML

The fix is in the master branch, but it will take some time for us to make a new release, please keep sklearn at 1.5 or use the nightly XGB build.