LabelEncoder Usage
r0f1 opened this issue · 2 comments
r0f1 commented
Hi,
The following piece of code throws an error. Why?
from kaggler.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit_transform(pd.Series([1,1,1,2,2,2,3,3,3]))
Error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
c:\Users\semic\Desktop\dsi19-oct\main.py in
1 le = LabelEncoder()
----> 2 le.fit_transform(pd.Series([1,1,1,2,2,2,3,3,3]))
~\Anaconda3\lib\site-packages\kaggler\preprocessing\categorical.py in fit_transform(self, X, y)
121 """
122
--> 123 self.label_encoders = [None] * X.shape[1]
124 self.label_maxes = [None] * X.shape[1]
125
IndexError: tuple index out of range
paullo0106 commented
Unlike sklearn.preprocessing's Label Encoder
which provides encoded labels for an array, fit_transform() in this package takes pandas.DataFrame
as input and encode all the columns in it, that's why you've got the index out of range error
def fit_transform(self, X, y=None):
"""Encode categorical columns into label encoded columns
Args:
X (pandas.DataFrame): categorical columns to encode
Returns:
(pandas.DataFrame): label encoded columns
"""
self.label_encoders = [None] * X.shape[1]
self.label_maxes = [None] * X.shape[1]
for i, col in enumerate(X.columns):
self.label_encoders[i], self.label_maxes[i] = \
self._get_label_encoder_and_max(X[col])
X.loc[:, col] = (X[col].fillna(NAN_INT)
.map(self.label_encoders[i])
.fillna(0))
return X
r0f1 commented
Ok thanks!