OneHotEncoderDF: OneHot encoder wrapper fails for columns reduction options
Closed this issue · 0 comments
mgelsm commented
Describe the bug
The wrapper for the OneHot encoder fails for columns reduction options (drop= "if_binary" or "first")
The wrapper automatically computes the expected columns length of the transformed dataset without taking into account the drop option
To Reproduce
Steps to reproduce the behavior:
- open a notebook
- Run the following code
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearndf.pipeline import PipelineDF
from sklearndf.transformation import (
ColumnTransformerDF,
OneHotEncoderDF,
SimpleImputerDF,
)
X_churn : pd.DataFrame = ...
y_churn : pd.Series = ...
<img width="1088" alt="Screenshot 2021-02-16 at 16 09 49" src="https://user-images.githubusercontent.com/32160831/108081572-6733de80-7071-11eb-8bca-f52932a4173e.png">
# For categorical features we will use the mode as the imputation value and also one-hot encode
preprocessing_categorical = PipelineDF(
steps=[
("imputer", SimpleImputerDF(strategy="most_frequent", fill_value="<na>")),
("one-hot", OneHotEncoderDF(sparse=False, drop="if_binary")),
]
)
# For numeric features we will impute using the median
preprocessing_numerical = SimpleImputerDF(strategy="median")
# Put the pipeline together
preprocessing_features = ColumnTransformerDF(
transformers=[
(
"categorical",
preprocessing_categorical,
make_column_selector(dtype_include=object),
),
(
"numerical",
preprocessing_numerical,
make_column_selector(dtype_include=np.number),
),
]
)
# Run the preprocessing
transformed_features = preprocessing_features.fit_transform(X=X_churn, y=y_churn)
transformed_features.head()
- See error
Expected behavior
Expected to see the transformed dataset with only one column for categorical columns that have only 2 unique values
- Version: sklearndf==1.0.1