scikit-learn-contrib/category_encoders

Pandas' string columns are not recognized

tvdboom opened this issue · 3 comments

Expected Behavior

Category encoders should recognize pandas string and string[pyarrow] types.

Actual Behavior

The column isn't recognized as categorical, and the dataframe is returned as is.

Steps to Reproduce the Problem

import pandas as pd
from category_encoders.target_encoder import TargetEncoder

X = pd.DataFrame([['a'], ['b']], dtype="string")
y = [0, 1]
print(X.dtypes)

print(TargetEncoder().fit_transform(X, y))

produces output:

0    string[python]
dtype: object

Warning: No categorical columns found. Calling 'transform' will only return input data.

   0
0  a
1  b

Specifications

  • Version: 2.6.2

I agree that string and arrow string should be recognized as categorical.
Even the categorical type itself it currently not recognized as such.

def get_obj_cols(df):

That's the function that need to be adjusted (and renamed)

alright, I'll make a pr