skrub-data/skrub

ENH Remove the `OneHotEncoder` inheritance `SimilarityEncoder`

Opened this issue · 2 comments

Problem Description

Follows up on #801

The SimilarityEncoder inherits from scikit-learn's OneHotEncoder, whose implementation might be heavy since we don't benefit from this parent class as we merely call check_X during fit.

Feature Description

Replace the inheritance with (TransformerMixin, BaseEstimator) and make the relevant small updates. This would also be the opportunity to perform some refactoring if needed.

Alternative Solutions

No response

Additional Context

No response

also following other discussions, should this encoder be made to work on dataframes and manipulate columns by name rather than index?