Failed to load model while following the tutorial 'Creating a custom serving runtime in KServe ModelMesh'
JimBeam2019 opened this issue · 3 comments
Describe the bug
While following the tutorial 'Creating a custom serving runtime in KServe ModelMesh' from the IBM site, I was trying to make a small adjustment, loading the sklearn mnist-svm.joblib model from the localMinIO instead. However, it failed to load the model and returned the error message MLServer Adapter.MLServer Adapter Server.LoadModel MLServer failed to load model {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}
.
I was wondering if it is a bug or if there are any mistakes that I have made on the configuration. Please give me any advice and let me know if any further details you need. Really appreciated.
To Reproduce
Steps to reproduce the behavior:
- Install ModelMesh Serving in the local minikube following the instruction
- Create Custom ML Model, code as below.
from mlserver.model import MLModel
from mlserver.utils import get_model_uri
from mlserver.errors import InferenceError
from mlserver.codecs import DecodedParameterName
from mlserver.types import (
InferenceRequest,
InferenceResponse,
ResponseOutput,
)
import logging
from joblib import load
import numpy as np
from os.path import exists
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
_to_exclude = {
"parameters": {DecodedParameterName, "headers"},
'inputs': {"__all__": {"parameters": {DecodedParameterName, "headers"}}}
}
WELLKNOWN_MODEL_FILENAMES = ["mnist-svm.joblib"]
class CustomMLModel(MLModel):
async def load(self) -> bool:
model_uri = await get_model_uri(
self._settings, wellknown_filenames=WELLKNOWN_MODEL_FILENAMES
)
logging.info("Model load URI: {model_uri}")
if exists(model_uri):
logging.info(f"Loading MNIST model from {model_uri}")
self._model = load(model_uri)
logging.info("Model loaded successfully")
else:
logging.info(f"Model not exist in {model_uri}")
self.ready = False
return self.ready
self.ready = True
return self.ready
async def predict(self, payload: InferenceRequest) -> InferenceResponse:
input_data = [input_data.data for input_data in payload.inputs]
input_name = [input_data.name for input_data in payload.inputs]
input_data_array = np.array(input_data)
result = self._model.predict(input_data_array)
predictions = np.array(result)
logger.info(f"Predict result is: {result}")
return InferenceResponse(
id=payload.id,
model_name = self.name,
model_version = self.version,
outputs = [
ResponseOutput(
name = str(input_name[0]),
shape = predictions.shape,
datatype = "INT64",
data=predictions.tolist(),
)
],
)
- Build a docker image with the Dockfile below, named dev.local/xgb-model:dev.2405042123
FROM python:3.9.13
RUN pip3 install --no-cache-dir mlserver==1.3.2 scikit-learn==1.4.0 joblib==1.3.2
COPY --chown=${USER} ./custom_model.py /opt/custom_model.py
ENV PYTHONPATH=/opt/
WORKDIR /opt
ENV MLSERVER_MODELS_DIR=/models/_mlserver_models \
MLSERVER_GRPC_PORT=8001 \
MLSERVER_HTTP_PORT=8002 \
MLSERVER_METRICS_PORT=8082 \
MLSERVER_LOAD_MODELS_AT_STARTUP=false \
MLSERVER_DEBUG=false \
MLSERVER_PARALLEL_WORKERS=1 \
MLSERVER_GRPC_MAX_MESSAGE_LENGTH=33554432 \
# https://github.com/SeldonIO/MLServer/pull/748
MLSERVER__CUSTOM_GRPC_SERVER_SETTINGS='{"grpc.max_metadata_size": "32768"}' \
MLSERVER_MODEL_NAME=dummy-model
ENV MLSERVER_MODEL_IMPLEMENTATION=custom_model.CustomMLModel
CMD ["mlserver", "start", "${MLSERVER_MODELS_DIR}"]
- Create a serving runtime with the yaml file below
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: custom-runtime-0.x
spec:
supportedModelFormats:
- name: custom-model
version: "1"
autoSelect: true
protocolVersions:
- grpc-v2
multiModel: true
grpcDataEndpoint: port:8001
grpcEndpoint: port:8085
containers:
- name: mlserver
image: dev.local/xgb-model:dev.2405042123
imagePullPolicy: IfNotPresent
env:
- name: MLSERVER_MODELS_DIR
value: "/models/_mlserver_models/"
- name: MLSERVER_GRPC_PORT
value: "8001"
- name: MLSERVER_HTTP_PORT
value: "8002"
- name: MLSERVER_LOAD_MODELS_AT_STARTUP
value: "false"
- name: MLSERVER_MODEL_NAME
value: dummy-model
- name: MLSERVER_HOST
value: "127.0.0.1"
- name: MLSERVER_GRPC_MAX_MESSAGE_LENGTH
value: "-1"
- name: MLSERVER_MODEL_IMPLEMENTATION
value: "custom_model.CustomMLModel"
- name: MLSERVER_DEBUG
value: "true"
- name: MLSERVER_MODEL_PARALLEL_WORKERS
value: "0"
resources:
requests:
cpu: "1"
memory: "1Gi"
limits:
cpu: "2"
memory: "1Gi"
builtInAdapter:
serverType: mlserver
runtimeManagementPort: 8001
memBufferBytes: 134217728
modelLoadingTimeoutMillis: 90000
- Create an inference service with the yaml file below
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: minio-model-isvc
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: custom-model
runtime: custom-runtime-0.x
storage:
key: localMinIO
path: sklearn/mnist-svm.joblib
- After the modelmesh pods start running, open the logs of mlserver-adapter.
Expected behavior
Usually, it should have loaded the model successfully.
Screenshots
However, it shows the logs as below.
2024-05-04T14:00:34Z INFO MLServer Adapter Starting MLServer Adapter {"adapter_config": {"Port":8085,"MLServerPort":8001,"MLServerContainerMemReqBytes":1073741824,"MLServerMemBufferBytes":134217728,"CapacityInBytes":939524096,"MaxLoadingConcurrency":1,"ModelLoadingTimeoutMS":90000,"DefaultModelSizeInBytes":1000000,"ModelSizeMultiplier":1.25,"RuntimeVersion":"dev.2405042123","LimitModelConcurrency":0,"RootModelDir":"/models/_mlserver_models","UseEmbeddedPuller":true}}
2024-05-04T14:00:34Z INFO MLServer Adapter.MLServer Adapter Server Created root MLServer model directory {"path": "/models/_mlserver_models"}
2024-05-04T14:00:34Z INFO MLServer Adapter.MLServer Adapter Server Connecting to MLServer... {"port": 8001}
2024-05-04T14:00:34Z INFO MLServer Adapter.MLServer Adapter Server Initializing Puller {"Dir": "/models"}
2024-05-04T14:00:34Z INFO MLServer Adapter.MLServer Adapter Server MLServer runtime adapter started
2024-05-04T14:00:34Z INFO MLServer Adapter.MLServer Adapter Server.client-cache starting clean up of cached clients
2024-05-04T14:00:34Z INFO MLServer Adapter Adapter will run at port {"port": 8085, "MLServer port": 8001}
2024-05-04T14:00:34Z INFO MLServer Adapter Adapter gRPC Server registered, now serving
2024-05-04T14:00:44Z INFO MLServer Adapter.MLServer Adapter Server Using runtime version returned by MLServer {"version": "1.3.2"}
2024-05-04T14:00:44Z INFO MLServer Adapter.MLServer Adapter Server runtimeStatus {"Status": "status:READY capacityInBytes:939524096 maxLoadingConcurrency:1 modelLoadingTimeoutMs:90000 defaultModelSizeInBytes:1000000 runtimeVersion:\"1.3.2\" methodInfos:{key:\"inference.GRPCInferenceService/ModelInfer\" value:{idInjectionPath:1}} methodInfos:{key:\"inference.GRPCInferenceService/ModelMetadata\" value:{idInjectionPath:1}}"}
2024-05-04T14:00:52Z INFO MLServer Adapter.MLServer Adapter Server.LoadModel Model details {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "modelType": "custom-model", "modelPath": "sklearn/mnist-svm.joblib"}
2024-05-04T14:00:52Z DEBUG MLServer Adapter.MLServer Adapter Server Reading storage credentials
2024-05-04T14:00:52Z DEBUG MLServer Adapter.MLServer Adapter Server creating new repository client {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e"}
2024-05-04T14:00:52Z DEBUG MLServer Adapter.MLServer Adapter Server found objects to download {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "count": 1}
2024-05-04T14:00:52Z DEBUG MLServer Adapter.MLServer Adapter Server downloading object {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "filename": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib"}
2024-05-04T14:00:52Z INFO MLServer Adapter.MLServer Adapter Server Calculated disk size {"modelFullPath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "disk_size": 344817}
2024-05-04T14:00:52Z INFO MLServer Adapter.MLServer Adapter Server.LoadModel Generated model settings file {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "schemaPath": "", "implementation": ""}
2024-05-04T14:00:52Z INFO MLServer Adapter.MLServer Adapter Server.LoadModel Adapted model directory for standalone file/dir {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "sourcePath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "isDir": false, "symLinkPath": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "generatedSettingsFile": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/model-settings.json"}
2024-05-04T14:00:52Z ERROR MLServer Adapter.MLServer Adapter Server.LoadModel MLServer failed to load model {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}
github.com/kserve/modelmesh-runtime-adapter/model-mesh-mlserver-adapter/server.(*MLServerAdapterServer).LoadModel
/opt/app/model-mesh-mlserver-adapter/server/server.go:137
github.com/kserve/modelmesh-runtime-adapter/internal/proto/mmesh._ModelRuntime_LoadModel_Handler
/opt/app/internal/proto/mmesh/model-runtime_grpc.pb.go:206
google.golang.org/grpc.(*Server).processUnaryRPC
/root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:1335
google.golang.org/grpc.(*Server).handleStream
/root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:1712
google.golang.org/grpc.(*Server).serveStreams.func1.1
/root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:947
2024-05-04T14:00:53Z INFO MLServer Adapter.MLServer Adapter Server.UnloadModel Unload request for model not found in MLServer {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}
Environment (please complete the following information):
- OS: Ubuntu 22.04.4 LTS
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Additional context