paucablop/chemotools

Improve compatability with the ```set_output``` API from ```scikit-learn```

paucablop opened this issue ยท 0 comments

Description

All the transformers from chemotools are compatible with scikit-learn, that is the objective of chemotools ๐Ÿ‘. In one of the most recent releases of scikit-learn they have introduced the set_output API, which basically allows the user to define an pandas as output. This will produce a pandas.DataFrame object as output instead of the default numpy.ndarray. This works fine with most of chemotools transformers, but I have some specific issues:

๐Ÿ‘‰ The column names are lost after the transformation

When I use a chemotools transformer setup to produce a pandas.DataFrame, it does not keep the column names, and produces an output without column names. I have compared the functionality with other scikit-learn transformers (such as StandardScaler(), and I have seen that they do keep the column names in the output.

๐Ÿ‘‰ The API does not work when the transformer reduces the number of features

Some transformers will reduce the number of features on our dataset (e.g., will select a subset of columns from it). These are under the variable selection transformers. I don't really know how to fix this issue.

Hacktoberfest Challenge

We invite open source developers to contribute to our project during Hacktoberfest. The goal is to improve compatibility with the set_output API

How to Contribute

Here is the contributing guidelines

Contact

[We can have the the conversation in the Issue or the Discussion](#45)

Resources

๐Ÿ‘‰ Link to set_output API form scikit-learn

๐Ÿ‘‰Link to problem description