Handle missing in one hot encoder

Question

Handle missing in one hot encoder

PaulWestenthanner opened this issue a year ago · 3 comments

Expected Behavior

Currently, handle_missing=value adds a new column although the documentation says 'value' will encode a new value as 0 in every dummy column.
Furthermore, we need a test for this

Actual Behavior

adds a column instead of using all 0

Steps to Reproduce the Problem

from category_encoders import OneHotEncoder
import pandas as pd

he = OneHotEncoder(handle_missing="value")

data = [("foo", 1), ("bar", 2), (None, 6)]
data = pd.DataFrame(data, columns=["c1", "c2"])
print(he.fit_transform(data))

Specifications

Version: 2.6
Platform: linux

Answer 1 · 2023-03-21T13:36:12.000Z

Would this replace the new "ignore" from #396?

I would expect this to be the correct behavior; is the added column a longstanding behavior, or perhaps a regression that wasn't caught in testing?

Answer 2 · 2023-03-24T09:37:00.000Z

Oh you're right. I missed this when adding the ignore option. Thanks for pointing out.
not sure about the naming though... we have the option value to put in "some value that makes sense" in most encoders. So it makes sense for people familiar with the library, ignore on the other hand is more telling