nv-morpheus/Morpheus

[BUG]: MLflow won't load phishing-bert-onnx model

Opened this issue · 0 comments

Version

24.03

Which installation method(s) does this occur on?

Kubernetes

Describe the bug.

Deployed Morpheus via NGC using helm charts. Unable to deploy phishing-bert-onnx model.

Minimum reproducible example

export API_KEY="<NGC KEY>"
export NAMESPACE="morpheus"
export RELEASE="testing"


helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-ai-engine-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar
helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-mlflow-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar
helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-sdk-client-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar

helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-engine" morpheus-ai-engine
helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-helper" morpheus-sdk-client

(when Running)

kubectl -n "${NAMESPACE}" exec "sdk-cli-${RELEASE}-helper" -- cp -RL /workspace/models /common

helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-mlflow" morpheus-mlflow

kubectl -n ${NAMESPACE} exec -it deploy/mlflow -- bash

python publish_model_to_mlflow.py \
      --model_name phishing-bert-onnx \
      --model_directory /common/models/triton-model-repo/phishing-bert-onnx \
      --flavor triton

mlflow deployments create -t triton \
      --flavor triton \
      --name phishing-bert-onnx \
      -m models:/phishing-bert-onnx/1 \
      -C "version=1"

Relevant log output

Triton Logs

E0604 16:12:24.504123 1 model_repository_manager.cc:1335] Poll failed for model directory 'phishing-bert-onnx': Invalid model name: Could not determine backend for model 'phishing-bert-onnx' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

Deployment Creation Logs

Successfully registered model 'phishing-bert-onnx'.
Created version '1' of model 'phishing-bert-onnx'.
/mlflow/artifacts/0/4281c565f9ef489880c9940e35992f54/artifacts
Saved mlflow-meta.json to /common/triton-model-repo/phishing-bert-onnx
Traceback (most recent call last):
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow_triton/deployments.py", line 115, in create_deployment
self.triton_client.load_model(name)
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/tritonclient/http/_client.py", line 669, in load_model
_raise_if_error(response)
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/tritonclient/http/_utils.py", line 69, in _raise_if_error
raise error
tritonclient.utils.InferenceServerException: [500] failed to load 'phishing-bert-onnx', failed to poll from model repository

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/mlflow/bin/mlflow", line 8, in
sys.exit(cli())
^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow/deployments/cli.py", line 151, in create_deployment
deployment = client.create_deployment(name, model_uri, flavor, config=config_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow_triton/deployments.py", line 117, in create_deployment
raise MlflowException(str(ex))
mlflow.exceptions.MlflowException: [500] failed to load 'phishing-bert-onnx', failed to poll from model repository

Full env printout

No response

Other/Misc.

No response

Code of Conduct

  • I agree to follow Morpheus' Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report