scikit-learn-contrib/category_encoders

get_feature_names_out is incompatible with sklearn estimators and eli5, consequently

DZIMDZEM opened this issue · 3 comments

Expected Behavior

In BaseEncoder, get_feature_names_out() should accept more than 1 argument as in other sklearn base estimators.

def get_feature_names_out(self, input_features=None):
      """
      ...
      """
      return _check_feature_names_in(self, input_features)

Actual Behavior

BaseEncoder's get_feature_names_out() accepts only 1 argument: self. It makes it incompatible with eli5 module and other modules that work with feature names when you use sklearn modules.

def get_feature_names_out(self) -> List[str]:
       """..."""
       if not isinstance(self.feature_names_out_, list):
           raise NotFittedError("Estimator has to be fitted to return feature names.")
       else:
           return self.feature_names_out_

Steps to Reproduce the Problem

  1. Add input_features keyword argument to get_feature_names_out.
  2. Copy/inherit _check_feature_names_in method from sklearn.utils.validation so get_feature_names_out has the same implementation as sklearn.base.BaseEstimator

As a temporary solution you can just override the method. Example for TargetEncoder:

class TargetEncoderFixed(TargetEncoder):
        def get_feature_names_out(self, *arg, **kargs):
            return self.feature_names_out_

Specifications

  • Version: 2.6.0
  • Platform: Windows
  • Subsystem: pipeline workflow

I think this was fixed by #398
Could you please confirm this? So if you check the current master branch the get_feature_names_out function already supports the input_features argument. I haven't built a release for this bugfix yet though, so if you install form pypi you should still experience this problem.
I can build a release this week though if it solves your problem

@PaulWestenthanner , yes, it resolves the problem. Thank you for the fast response!

Version 2.6.1 is published on pypi now