Implement Triton integration with KNative Serving

Question

Implement Triton integration with KNative Serving

Closed this issue 8 months ago · 4 comments

Context

Integrate Nvidia Triton for Serving, via KServe with GPUs

What needs to get done

set the required configurations to deploy an ISVC with Triton and on GPU.
From #169 and #170, the configurations are:
In the config-deployment ConfigMap:

progress-deadline set to "600s"
registries-skipping-tag-resolving set to "nvcr.io"

In the config-features ConfigMap:

kubernetes.podspec-affinity: "enabled"
kubernetes.podspec-nodeselector: "enabled"
kubernetes.podspec-tolerations: "enabled"

Definition of Done

PR to enable configuring Serving in Charmed Kubeflow for Nvidia Triton is merged

Answer 1 · 2024-01-29T11:25:32.000Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5251.

This message was autogenerated

Answer 2 · 2024-02-26T13:18:54.000Z

My proposed plan for this is to handle it in 2 ways:

Keep on using the current knative-local-gateway and the K8s externalName SVC for the requests to the ISVC (to avoid needing Dex cookies)
Extend Knative to allow configuring Affinities and Tolerations, so that the ISVC Pod can be scheduled on the GPU node https://kserve.github.io/website/master/modelserving/nodescheduling/inferenceservicenodescheduling/

I propose to set the above Knative settings as config options in the Knative Charms. I would even propose that this is enabled as default, to have a similar UX with Notebooks.

So most of the implementation in this effort is to configure Knative for GPU node scheduling and then providing a tutorial that exposes how to create and reach an ISVC that is using a GPU

Answer 3 · 2024-03-07T11:05:36.000Z

based on the spec, the implementation will be this way:
Note: this is the section from the spec on the chosen approach. For more details refer to spec KF084.

Set the required configurations by default, so that the user does not have to change them for Triton.
Add a config option for registries-skip-tag-resolution because it is a possible use case for users to need to add to the list. As mentioned in the point above, the default for this config will be “nvcr.io” since we know that most users would not want tag resolution on the nvcr.io registry. The config.yaml will be:

options:
  registries-skip-tag-resolution:
    default: "nvcr.io"
    description: Comma-seperated list of repositories for which tag to digest resolving should be skipped.
    type: string

Add a config option for progress-deadline with a default of “600s”, this way users don’t need to change it for Triton, but also have the option to change the timeout to suit their deployments. The option in the config.yaml will be:

options:
  progress-deadline:
    default: "600s"
    description:  the duration to wait for the deployment to be ready before considering it failed.
    type: string

Change the KnativeServing manifest to have the GPU Podspec settings enabled by default and to take in the config value both config options defined in the previous points :

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: {{ app_name }}
  namespace: {{ serving_namespace }}
spec:
  version: {{ serving_version }}
  config:
    deployment:
      progress-deadline: {{ progress_deadline}}
      registries-skipping-tag-resolving: {{ registries_skip_tag_resolving }}
    features:
      kubernetes.podspec-affinity: "enabled"
      kubernetes.podspec-nodeselector: "enabled"
      kubernetes.podspec-tolerations: "enabled"

Answer 4 · 2024-03-08T13:39:45.000Z

Testing

`progress-deadline`

deploy ISVC example with <ISVC name>
get the Deployment with the label serving.kserve.io/inferenceservice: <ISVC name>
assert in the .spec: progressDeadlineSeconds: <config value>

`podspec-[nodeselector, affinity, tolerations]`

deploy ISVC example with <ISVC name> with nodeselector, affinity, and tolerations set, example:

kubectl apply -n kserve-test -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: disktype
              operator: In
              values:
              - ssd
    nodeSelector:
      myLabel1: "true"
    tolerations:
      - key: "myTaint1"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

get the Deployment with the label serving.kserve.io/inferenceservice: <ISVC name>
assert in the Deployment .spec: on affinity, nodeSelector, tolerations