normalize feature transformations

Question

normalize feature transformations

lacava opened this issue 7 years ago · 4 comments

normalize feature transformations automatically before feeding them into the ML fit method. store the transformer so that it can be used in prediction/transformation as well.

Answer 1 · 2017-06-14T17:36:00.000Z

add self.scaler = StandardScaler() to init
add self._best_scaler to init
add scaler transformation to transform() method
add self._best_scaler that is updated when better model found
call self._best_scaler when transform fn is used in prediction

Answer 2 · 2017-06-26T18:22:00.000Z

I think it would be cleaner to use a Pipeline for this:

from sklearn.linear_model import LassoLarsCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

steps = ("scaler", StandardScaler()), ("estimator", LassoLarsCV())
model = Pipeline(steps)

The api is still the same: model.fit(x_train, y_train), model.predict(x_test)
You could even write a Transformer which takes a set of expressions/functions and transforms x to the features.

steps = ("features", MyTransformer(exprs)), ("scaler", StandardScaler()), ("estimator", LassoLarsCV())

Using the model down the line becomes much simpler, e.g. saving it and using it for estimation in a different context, as everything you need it contained in the pipeline object.

Answer 3 · 2017-06-26T18:38:02.000Z

that's a good point, we should use the sklearn Pipeline for this, and for our transformations. right now predict() manually transforms then calls predict on the best estimator. it should all be combined into one sklearn Pipeline.

Answer 4 · 2017-07-08T16:55:01.000Z

fixed in commit 9124540