MLModelCatalog predict method incompatible with pipeline
JohanvandenHeuvel opened this issue · 5 comments
The MLModelCatalog predict method has the following signature:
def predict(
self, x: Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]
) -> Union[np.ndarray, pd.DataFrame, torch.Tensor, tf.Tensor]:
however if the MLModelCatalog pipeline is enabled then x is also input for
def perform_pipeline(self, df: pd.DataFrame) -> pd.DataFrame:
i.e. the predict function can take input types that are incompatible with the possible model settings.
The problem is that perform_pipeline
can potentially use the encoder which relies on it's input being a dataframe.
@JohanvandenHeuvel that means that the pipeline cannot work for other inputs other than DataFrame right? We could make that explicit and not execute the pipeline for the tensors?
It seems that way yes. And we can't make the pipeline fully work for other inputs, except scaling the data.
How do we fix this? We could keep it as is, and let the pipeline function throw an exception. We could throw an exception ourselves. Or we only allow for the predict function to use dataframes as input.
I guess you could normalize a dataframe, convert this to e.g. a Tensor and give this "pipelined" Tensor as input to the predict function. But like in #115, this results in having two ways to normalization, and this isn't very consistent IMO.
Fixed by moving normalization to the data part.