sberbank-ai-lab/LightAutoML

Be able to put TabularNLPAutoML into an sklearn pipeline

darenr opened this issue · 5 comments

    steps = [
        ('automl', TabularNLPAutoML(...))
    ]
    pipeline = Pipeline(steps)

    pred = pipeline.fit_predict(
        df_train,
        roles={"target": TARGET_NAME, "text": TEXT_COLUMNS, "drop": DROP_FEATURES},
    )

produces:

TypeError: Last step of Pipeline should implement fit or be the string 'passthrough'.

I'm doing this because I have some text clean up Transformers that I'd like to pickle in one "model" object so the same clean up happens at inference time.

Hi @darenr, we don't work with sklearn pipelines as we have the specific pipeline of data preparation inside. We also don't have the fit method - fit_predict only because it will be strange to calculate OOF predictions and not returning them back to the user.

You can fix the situation using the simple idea - make all the preparations before start of LightAutoML work and create a new column of your cleaned text (not the array of words but text) and set it as a text column.

Alex

thanks for the response, I can do that but then the piclke object at inference time won't have the input data pipleline, I wonder if I can pass in a valid transformer stage to TabularNLPAutoML?

it's text cleaning that I want to do with a text model

Yep, I figure out what you are talking about and that's why I suggest you to make it beforehand - before the model prediction. In this case there is no need to put it inside pickle object, it can be the code as well.

by the way - amazing library Alex