MLModelCatalog predict method incompatible with pipeline

Question

MLModelCatalog predict method incompatible with pipeline

JohanvandenHeuvel opened this issue 3 years ago · 5 comments

The MLModelCatalog predict method has the following signature:

def predict(
    self, x: Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]
) -> Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]:

however if the MLModelCatalog pipeline is enabled then x is also input for

def perform_pipeline(self, df: pd.DataFrame) -> pd.DataFrame:

i.e. the predict function can take input types that are incompatible with the possible model settings.

Answer 1 · 2021-12-10T14:27:01.000Z

The problem is that perform_pipeline can potentially use the encoder which relies on it's input being a dataframe.

Answer 2 · 2021-12-14T08:52:12.000Z

@JohanvandenHeuvel that means that the pipeline cannot work for other inputs other than DataFrame right? We could make that explicit and not execute the pipeline for the tensors?

Answer 3 · 2021-12-14T10:13:58.000Z

It seems that way yes. And we can't make the pipeline fully work for other inputs, except scaling the data.

Answer 4 · 2021-12-14T10:41:14.000Z

How do we fix this? We could keep it as is, and let the pipeline function throw an exception. We could throw an exception ourselves. Or we only allow for the predict function to use dataframes as input.

I guess you could normalize a dataframe, convert this to e.g. a Tensor and give this "pipelined" Tensor as input to the predict function. But like in #115, this results in having two ways to normalization, and this isn't very consistent IMO.

Answer 5 · 2022-03-22T08:37:45.000Z

Fixed by moving normalization to the data part.