kserve/modelmesh-serving

Failed to load model while following the tutorial 'Creating a custom serving runtime in KServe ModelMesh'

JimBeam2019 opened this issue · 3 comments

Describe the bug

While following the tutorial 'Creating a custom serving runtime in KServe ModelMesh' from the IBM site, I was trying to make a small adjustment, loading the sklearn mnist-svm.joblib model from the localMinIO instead. However, it failed to load the model and returned the error message MLServer Adapter.MLServer Adapter Server.LoadModel MLServer failed to load model {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}.

I was wondering if it is a bug or if there are any mistakes that I have made on the configuration. Please give me any advice and let me know if any further details you need. Really appreciated.

To Reproduce
Steps to reproduce the behavior:

  1. Install ModelMesh Serving in the local minikube following the instruction
  2. Create Custom ML Model, code as below.
from mlserver.model import MLModel
from mlserver.utils import get_model_uri
from mlserver.errors import InferenceError
from mlserver.codecs import DecodedParameterName
from mlserver.types import (
    InferenceRequest,
    InferenceResponse,
    ResponseOutput,
)
import logging
from joblib import load
import numpy as np

from os.path import exists

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

_to_exclude = {
    "parameters": {DecodedParameterName, "headers"},
    'inputs': {"__all__": {"parameters": {DecodedParameterName, "headers"}}}
}

WELLKNOWN_MODEL_FILENAMES = ["mnist-svm.joblib"]

class CustomMLModel(MLModel):

  async def load(self) -> bool:
    model_uri = await get_model_uri(
       self._settings, wellknown_filenames=WELLKNOWN_MODEL_FILENAMES
    )
    logging.info("Model load URI: {model_uri}")

    if exists(model_uri):
      logging.info(f"Loading MNIST model from {model_uri}")
      self._model = load(model_uri)
      logging.info("Model loaded successfully")
    else:
      logging.info(f"Model not exist in {model_uri}")
      self.ready = False
      return self.ready

    self.ready = True
    return self.ready

  async def predict(self, payload: InferenceRequest) -> InferenceResponse:
    input_data = [input_data.data for input_data in payload.inputs]
    input_name = [input_data.name for input_data in payload.inputs]
    input_data_array = np.array(input_data)
    result = self._model.predict(input_data_array) 
    predictions = np.array(result)

    logger.info(f"Predict result is: {result}")
    return InferenceResponse(
        id=payload.id,
        model_name = self.name,
        model_version = self.version,
        outputs = [
            ResponseOutput(
                name = str(input_name[0]),
                shape = predictions.shape,
                datatype = "INT64",
                data=predictions.tolist(),
            )
        ],
    )   
  1. Build a docker image with the Dockfile below, named dev.local/xgb-model:dev.2405042123
FROM python:3.9.13

RUN pip3 install --no-cache-dir mlserver==1.3.2 scikit-learn==1.4.0 joblib==1.3.2

COPY --chown=${USER} ./custom_model.py /opt/custom_model.py
ENV PYTHONPATH=/opt/
WORKDIR /opt

ENV MLSERVER_MODELS_DIR=/models/_mlserver_models \
    MLSERVER_GRPC_PORT=8001 \
    MLSERVER_HTTP_PORT=8002 \
    MLSERVER_METRICS_PORT=8082 \
    MLSERVER_LOAD_MODELS_AT_STARTUP=false \
    MLSERVER_DEBUG=false \
    MLSERVER_PARALLEL_WORKERS=1 \
    MLSERVER_GRPC_MAX_MESSAGE_LENGTH=33554432 \
    # https://github.com/SeldonIO/MLServer/pull/748
    MLSERVER__CUSTOM_GRPC_SERVER_SETTINGS='{"grpc.max_metadata_size": "32768"}' \
    MLSERVER_MODEL_NAME=dummy-model

ENV MLSERVER_MODEL_IMPLEMENTATION=custom_model.CustomMLModel

CMD ["mlserver", "start", "${MLSERVER_MODELS_DIR}"]
  1. Create a serving runtime with the yaml file below
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: custom-runtime-0.x
spec:
  supportedModelFormats:
    - name: custom-model
      version: "1"
      autoSelect: true
  protocolVersions:
    - grpc-v2
  multiModel: true
  grpcDataEndpoint: port:8001
  grpcEndpoint: port:8085
  containers:
    - name: mlserver
      image: dev.local/xgb-model:dev.2405042123
      imagePullPolicy: IfNotPresent
      env:
        - name: MLSERVER_MODELS_DIR
          value: "/models/_mlserver_models/"
        - name: MLSERVER_GRPC_PORT
          value: "8001"
        - name: MLSERVER_HTTP_PORT
          value: "8002"
        - name: MLSERVER_LOAD_MODELS_AT_STARTUP
          value: "false"
        - name: MLSERVER_MODEL_NAME
          value: dummy-model
        - name: MLSERVER_HOST
          value: "127.0.0.1"
        - name: MLSERVER_GRPC_MAX_MESSAGE_LENGTH
          value: "-1"
        - name: MLSERVER_MODEL_IMPLEMENTATION
          value: "custom_model.CustomMLModel"
        - name: MLSERVER_DEBUG
          value: "true"
        - name: MLSERVER_MODEL_PARALLEL_WORKERS
          value: "0"
      resources:
        requests:
          cpu: "1"
          memory: "1Gi"
        limits:
          cpu: "2"
          memory: "1Gi"
  builtInAdapter:
    serverType: mlserver
    runtimeManagementPort: 8001
    memBufferBytes: 134217728
    modelLoadingTimeoutMillis: 90000
  1. Create an inference service with the yaml file below
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: minio-model-isvc
  annotations:
    serving.kserve.io/deploymentMode: ModelMesh
spec:
  predictor:
    model:
      modelFormat:
        name: custom-model
      runtime: custom-runtime-0.x
      storage:
        key: localMinIO
        path: sklearn/mnist-svm.joblib
  1. After the modelmesh pods start running, open the logs of mlserver-adapter.

Expected behavior

Usually, it should have loaded the model successfully.

Screenshots

However, it shows the logs as below.

2024-05-04T14:00:34Z    INFO    MLServer Adapter        Starting MLServer Adapter       {"adapter_config": {"Port":8085,"MLServerPort":8001,"MLServerContainerMemReqBytes":1073741824,"MLServerMemBufferBytes":134217728,"CapacityInBytes":939524096,"MaxLoadingConcurrency":1,"ModelLoadingTimeoutMS":90000,"DefaultModelSizeInBytes":1000000,"ModelSizeMultiplier":1.25,"RuntimeVersion":"dev.2405042123","LimitModelConcurrency":0,"RootModelDir":"/models/_mlserver_models","UseEmbeddedPuller":true}}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Created root MLServer model directory   {"path": "/models/_mlserver_models"}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Connecting to MLServer...       {"port": 8001}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Initializing Puller     {"Dir": "/models"}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        MLServer runtime adapter started
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server.client-cache   starting clean up of cached clients
2024-05-04T14:00:34Z    INFO    MLServer Adapter        Adapter will run at port        {"port": 8085, "MLServer port": 8001}
2024-05-04T14:00:34Z    INFO    MLServer Adapter        Adapter gRPC Server registered, now serving
2024-05-04T14:00:44Z    INFO    MLServer Adapter.MLServer Adapter Server        Using runtime version returned by MLServer      {"version": "1.3.2"}
2024-05-04T14:00:44Z    INFO    MLServer Adapter.MLServer Adapter Server        runtimeStatus   {"Status": "status:READY capacityInBytes:939524096 maxLoadingConcurrency:1 modelLoadingTimeoutMs:90000 defaultModelSizeInBytes:1000000 runtimeVersion:\"1.3.2\" methodInfos:{key:\"inference.GRPCInferenceService/ModelInfer\" value:{idInjectionPath:1}} methodInfos:{key:\"inference.GRPCInferenceService/ModelMetadata\" value:{idInjectionPath:1}}"}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Model details   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "modelType": "custom-model", "modelPath": "sklearn/mnist-svm.joblib"}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        Reading storage credentials
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        creating new repository client  {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e"}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        found objects to download       {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "count": 1}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        downloading object      {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "filename": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib"}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server        Calculated disk size    {"modelFullPath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "disk_size": 344817}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Generated model settings file   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "schemaPath": "", "implementation": ""}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Adapted model directory for standalone file/dir {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "sourcePath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "isDir": false, "symLinkPath": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "generatedSettingsFile": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/model-settings.json"}
2024-05-04T14:00:52Z    ERROR   MLServer Adapter.MLServer Adapter Server.LoadModel      MLServer failed to load model   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}
github.com/kserve/modelmesh-runtime-adapter/model-mesh-mlserver-adapter/server.(*MLServerAdapterServer).LoadModel
        /opt/app/model-mesh-mlserver-adapter/server/server.go:137
github.com/kserve/modelmesh-runtime-adapter/internal/proto/mmesh._ModelRuntime_LoadModel_Handler
        /opt/app/internal/proto/mmesh/model-runtime_grpc.pb.go:206
google.golang.org/grpc.(*Server).processUnaryRPC
        /root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:1335
google.golang.org/grpc.(*Server).handleStream
        /root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:1712
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:947
2024-05-04T14:00:53Z    INFO    MLServer Adapter.MLServer Adapter Server.UnloadModel    Unload request for model not found in MLServer  {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}

Environment (please complete the following information):

  • OS: Ubuntu 22.04.4 LTS
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context

I was trying to make a small adjustment, loading the sklearn mnist-svm.joblib model from the localMinIO instead.

Did the tutorial or example work without making changes?

@rafvasq -- can you spot something obvious? I would have to go through your tutorial myself and debug 😊