Improve compatability with the ```set_output``` API from ```scikit-learn```
paucablop opened this issue ยท 0 comments
Description
All the transformers from chemotools
are compatible with scikit-learn
, that is the objective of chemotools
๐. In one of the most recent releases of scikit-learn
they have introduced the set_output
API, which basically allows the user to define an pandas
as output. This will produce a pandas.DataFrame
object as output instead of the default numpy.ndarray
. This works fine with most of chemotools
transformers, but I have some specific issues:
๐ The column names are lost after the transformation
When I use a chemotools
transformer setup to produce a pandas.DataFrame
, it does not keep the column names, and produces an output without column names. I have compared the functionality with other scikit-learn
transformers (such as StandardScaler()
, and I have seen that they do keep the column names in the output.
๐ The API does not work when the transformer reduces the number of features
Some transformers will reduce the number of features on our dataset (e.g., will select a subset of columns from it). These are under the variable selection transformers. I don't really know how to fix this issue.
Hacktoberfest Challenge
We invite open source developers to contribute to our project during Hacktoberfest. The goal is to improve compatibility with the set_output
API
How to Contribute
Here is the contributing guidelines
Contact
[We can have the the conversation in the Issue or the Discussion](#45)