lacava/few

normalize feature transformations

lacava opened this issue · 4 comments

normalize feature transformations automatically before feeding them into the ML fit method. store the transformer so that it can be used in prediction/transformation as well.

  • add self.scaler = StandardScaler() to init
  • add self._best_scaler to init
  • add scaler transformation to transform() method
  • add self._best_scaler that is updated when better model found
  • call self._best_scaler when transform fn is used in prediction

I think it would be cleaner to use a Pipeline for this:

from sklearn.linear_model import LassoLarsCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

steps = ("scaler", StandardScaler()), ("estimator", LassoLarsCV())
model = Pipeline(steps)

The api is still the same: model.fit(x_train, y_train), model.predict(x_test)
You could even write a Transformer which takes a set of expressions/functions and transforms x to the features.

steps = ("features", MyTransformer(exprs)), ("scaler", StandardScaler()), ("estimator", LassoLarsCV())

Using the model down the line becomes much simpler, e.g. saving it and using it for estimation in a different context, as everything you need it contained in the pipeline object.

that's a good point, we should use the sklearn Pipeline for this, and for our transformations. right now predict() manually transforms then calls predict on the best estimator. it should all be combined into one sklearn Pipeline.

fixed in commit 9124540