Utilities for the Python data analysis library Pandas.
Apply onehot encoding to the passed columns
onehot(df, columns, new_column=False)
names names_George names_John names_Paul names_Ringo
0 Paul 0 0 1 0
1 George 1 0 0 0
2 Ringo 0 0 0 1
3 Ringo 0 0 0 1
4 John 0 1 0 0
5 John 0 1 0 0
6 John 0 1 0 0
Encodes categorical features as its count in the column.
lc = LabelCount(["names"])
lc.fit(df)
lc.transform(df)
names names_labelcount
0 Paul 1
1 George 1
2 Ringo 2
3 Ringo 2
4 John 3
5 John 3
6 John 3
Encodes categories as its count rank
rc = RankCategorical(["names"], inverse=False, new_column=False)
rc.fit(df)
rc.transform(df)
names names_rankcategorical
0 Paul 4
1 George 3
2 Ringo 2
3 Ringo 2
4 John 1
5 John 1
6 John 1
Encodes categories as its target mean
te = TargetEncoder(["names"], "target")
te.fit(df)
te.transform(df)
names target names_target_encoding
0 Paul 10 10.0
1 George 2 2.0
2 Ringo 4 4.5
3 Ringo 5 4.5
4 John 1 2.0
5 John 3 2.0
6 John 2 2.0
Feature creation based on date information