Utility functions, preprocessing steps, and class I need during in my research and developement projects in scikit learn.
You can install sklearn-utils
with pip
:
pip install sklearn-utils
If you want to scale your data based on reference values you may use StandardScalerByLabel. For example, I scale all the blood sample by healthy samples.
from sklearn_utils.preprocessing import StandardScalerByLabel
preprocessing = StandardScalerByLabel('healthy')
X_t = preprocessing.fit_transform(X, y)
Or you may want your list of dict in the end of sklearn pipeline, after set of operations and feature selection.
from sklearn_utils.preprocessing import InverseDictVectorizer
vect = DictVectorizer(sparse=False)
skb = SelectKBest(k=100)
pipe = Pipeline([
('vect', vect),
('skb', skb),
('inv_vect', InverseDictVectorizer(vect, skb))
])
X_t = pipe.fit_transform(X, y)
For more features, You can check the documentation.
The documentation of the project avaiable in http://sklearn-utils.rtfd.io .