scikit-learn-contrib/category_encoders

skl v1.2

bmreiniger opened this issue · 3 comments

scikit-learn will be releasing version 1.2 soon. I upgraded to their master branch and ran the tests here; everything appears to work except that

  1. sklearn will start throwing an error if feature names are not all strings (source). The test test_encoders.test_column_transformer uses tests.helpers.create_dataset which includes one column with name 321, and breaks. I don't think there's much reason to support the mixed column name types, and suggest modifying the test dataset.
  2. the load_boston dataset has been deprecated and will be removed in 1.2; the test test_encoders.test_string_index should be changed to some other sklearn dataset, or perhaps the internal one.

I'm surprised I don't see errors from get_features_names vs. get_feature_names_out, since #382 isn't merged yet; have I done something wrong in setting up my environment I wonder?

Hi Ben,

I just merged #382 not sure why it wasn't a problem. Do you know when sklearn 1.2 will be released as stable? For 1 I agree that we should just delete the test if it is unsupported.
2. should be more work since all our doctests rely on the boston housing. The deprecation message suggests california housing dataset instead. Maybe we should just switch to that, but the column names will be different as well.

Thanks for testing against 1.2 and pointing this out! great help!

1.2 release candidate just hit PyPI, with a regular release to follow in a "week or two".

I probably won't have time to replace the boston housing dataset in the next couple of weeks. Should we just put some sklearn < 1.2 in the requirements for the moment?