CountEncoder returning categories instead of floats/ints after transform
glevv opened this issue · 2 comments
glevv commented
import numpy as np
import pandas as pd
from category_encoders import CountEncoder
X = pd.DataFrame({
'some_cat': ['W', 'L', 'W', 'W', 'L'],
'some_num': np.random.normal(size=5)},
columns=['some_cat', 'some_num'])
X['some_cat'] = X['some_cat'].astype('category')
ce = CountEncoder(cols=['some_cat'])
Xt = ce.fit_transform(X)
Xt.info()
and the outputs
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 some_cat 5 non-null category
1 some_num 5 non-null float64
dtypes: category(1), float64(1)
memory usage: 269.0 bytes
some_cat some_num
0 3 0.660804
1 2 -0.150932
2 3 -1.044160
3 3 0.115020
4 2 -0.035625
It breaks sklearn pipelines, since estimators cannot work with 'category' dtype.
Tested on laptop (category_encoders 2.2.2, pandas 1.2.4) and colab (category_encoders 2.2.2, pandas 1.1.5).
PaulWestenthanner commented
should be fixed by #336 which will be released with the next release.
On pandas 1.4.0 I get
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 some_cat 5 non-null int64
1 some_num 5 non-null float64
dtypes: float64(1), int64(1)
memory usage: 208.0 bytes
PaulWestenthanner commented
fixed in version 2.4.0. Closing the issue