[BUG]: MLflow won't load phishing-bert-onnx model

Question

[BUG]: MLflow won't load phishing-bert-onnx model

Opened this issue 4 months ago · 0 comments

nvawood commented 4 months ago

Version

24.03

Which installation method(s) does this occur on?

Kubernetes

Describe the bug.

Deployed Morpheus via NGC using helm charts. Unable to deploy phishing-bert-onnx model.

Minimum reproducible example

export API_KEY="<NGC KEY>"
export NAMESPACE="morpheus"
export RELEASE="testing"


helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-ai-engine-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar
helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-mlflow-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar
helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-sdk-client-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar

helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-engine" morpheus-ai-engine
helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-helper" morpheus-sdk-client

(when Running)

kubectl -n "${NAMESPACE}" exec "sdk-cli-${RELEASE}-helper" -- cp -RL /workspace/models /common

helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-mlflow" morpheus-mlflow

kubectl -n ${NAMESPACE} exec -it deploy/mlflow -- bash

python publish_model_to_mlflow.py \
      --model_name phishing-bert-onnx \
      --model_directory /common/models/triton-model-repo/phishing-bert-onnx \
      --flavor triton

mlflow deployments create -t triton \
      --flavor triton \
      --name phishing-bert-onnx \
      -m models:/phishing-bert-onnx/1 \
      -C "version=1"

Relevant log output

Triton Logs

E0604 16:12:24.504123 1 model_repository_manager.cc:1335] Poll failed for model directory 'phishing-bert-onnx': Invalid model name: Could not determine backend for model 'phishing-bert-onnx' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

Deployment Creation Logs

Successfully registered model 'phishing-bert-onnx'.

Created version '1' of model 'phishing-bert-onnx'.

/mlflow/artifacts/0/4281c565f9ef489880c9940e35992f54/artifacts

Saved mlflow-meta.json to /common/triton-model-repo/phishing-bert-onnx

Traceback (most recent call last):

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow_triton/deployments.py", line 115, in create_deployment

self.triton_client.load_model(name)

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/tritonclient/http/_client.py", line 669, in load_model

_raise_if_error(response)

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/tritonclient/http/_utils.py", line 69, in _raise_if_error

raise error

tritonclient.utils.InferenceServerException: [500] failed to load 'phishing-bert-onnx', failed to poll from model repository
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/opt/conda/envs/mlflow/bin/mlflow", line 8, in 

sys.exit(cli())

^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1157, in call

return self.main(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1078, in main

rv = self.invoke(ctx)

^^^^^^^^^^^^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1688, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1688, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1434, in invoke

return ctx.invoke(self.callback, **ctx.params)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 783, in invoke

return __callback(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow/deployments/cli.py", line 151, in create_deployment

deployment = client.create_deployment(name, model_uri, flavor, config=config_dict)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow_triton/deployments.py", line 117, in create_deployment

raise MlflowException(str(ex))

mlflow.exceptions.MlflowException: [500] failed to load 'phishing-bert-onnx', failed to poll from model repository

Full env printout

No response

Other/Misc.

No response

Code of Conduct

I agree to follow Morpheus' Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report