[new features]: Quantile Encoder
cmougan opened this issue · 3 comments
cmougan commented
Implementation of Quantile Encoder from the publication (https://arxiv.org/abs/2105.13783)
Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems
Carlos Mougan, David Masip, Jordi Nin, Oriol Pujol
Zainny1234 commented
Hi cmougan.Thanks for sharing the paper.I am trying to create a rental price avm model, but have categorical values with high cardinality. I am going through the paper, having slight difficulty in grasping the methodology.Is there a coded solution anywhere for this?
cmougan commented
Hi @Zainny1234! The usage of this encoder follows the same structure than the rest of category_encoders packages.
>>> from category_encoders import *
>>> import pandas as pd
>>> from sklearn.datasets import load_boston
>>> bunch = load_boston()
>>> y = bunch.target
>>> X = pd.DataFrame(bunch.data, columns=bunch.feature_names)
>>> enc = QuantileEncoder(cols=['CHAS', 'RAD']).fit(X, y)
>>> numeric_dataset = enc.transform(X)
>>> print(numeric_dataset.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 13 columns):
CRIM 506 non-null float64
ZN 506 non-null float64
INDUS 506 non-null float64
CHAS 506 non-null float64
NOX 506 non-null float64
RM 506 non-null float64
AGE 506 non-null float64
DIS 506 non-null float64
RAD 506 non-null float64
TAX 506 non-null float64
PTRATIO 506 non-null float64
B 506 non-null float64
LSTAT 506 non-null float64
dtypes: float64(13)
memory usage: 51.5 KB
None
While the PR does not get accepted, you can use the package in
from sktools import QuantileEncoder
PaulWestenthanner commented
added in PR #303