Be able to put TabularNLPAutoML into an sklearn pipeline
darenr opened this issue · 5 comments
steps = [
('automl', TabularNLPAutoML(...))
]
pipeline = Pipeline(steps)
pred = pipeline.fit_predict(
df_train,
roles={"target": TARGET_NAME, "text": TEXT_COLUMNS, "drop": DROP_FEATURES},
)
produces:
TypeError: Last step of Pipeline should implement fit or be the string 'passthrough'.
I'm doing this because I have some text clean up Transformers that I'd like to pickle in one "model" object so the same clean up happens at inference time.
Hi @darenr, we don't work with sklearn pipelines as we have the specific pipeline of data preparation inside. We also don't have the fit method - fit_predict only because it will be strange to calculate OOF predictions and not returning them back to the user.
You can fix the situation using the simple idea - make all the preparations before start of LightAutoML work and create a new column of your cleaned text (not the array of words but text) and set it as a text column.
Alex
thanks for the response, I can do that but then the piclke object at inference time won't have the input data pipleline, I wonder if I can pass in a valid transformer
stage to TabularNLPAutoML
?
it's text cleaning that I want to do with a text model
Yep, I figure out what you are talking about and that's why I suggest you to make it beforehand - before the model prediction. In this case there is no need to put it inside pickle object, it can be the code as well.
by the way - amazing library Alex