scikit-learn-contrib/category_encoders

OneHotEncoder: handle_missing = 'ignore' would be very useful

woodly0 opened this issue · 2 comments

Expected Behavior

It would be nice to be able to ignore missing values instead of creating new columns with an "_nan" suffix. Just like it is possible with pandas. What do you think?

Actual Behavior

Doesn't exist in the current latest version (accoring to my knowledge)

Steps to Reproduce

import pandas as pd
import numpy as np
from category_encoders import OneHotEncoder

encoder = OneHotEncoder(
    cols=None,  # all non-numeric
    return_df=True,
    handle_missing="value",  # would be nice to have the option 'ignore'
    use_cat_names=True,
)
df = pd.DataFrame(
    {"this": ["GREEN", "GREEN", "YELLOW", "YELLOW"], "that": ["A", "B", "A", np.nan]}
)

encoder.fit_transform(df) # unwanted result
pd.get_dummies(df, dummy_na=False) # wanted result

Specifications

  • Version: 2.5.1.post0