Implement Triton integration with KNative Serving
Closed this issue · 4 comments
Context
Integrate Nvidia Triton for Serving, via KServe with GPUs
What needs to get done
set the required configurations to deploy an ISVC with Triton and on GPU.
From #169 and #170, the configurations are:
In the config-deployment
ConfigMap:
progress-deadline
set to"600s"
registries-skipping-tag-resolving
set to"nvcr.io"
In the config-features
ConfigMap:
kubernetes.podspec-affinity: "enabled"
kubernetes.podspec-nodeselector: "enabled"
kubernetes.podspec-tolerations: "enabled"
Definition of Done
PR to enable configuring Serving in Charmed Kubeflow for Nvidia Triton is merged
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5251.
This message was autogenerated
My proposed plan for this is to handle it in 2 ways:
- Keep on using the current
knative-local-gateway
and the K8s externalName SVC for the requests to the ISVC (to avoid needing Dex cookies) - Extend Knative to allow configuring Affinities and Tolerations, so that the ISVC Pod can be scheduled on the GPU node https://kserve.github.io/website/master/modelserving/nodescheduling/inferenceservicenodescheduling/
I propose to set the above Knative settings as config options in the Knative Charms. I would even propose that this is enabled as default, to have a similar UX with Notebooks.
So most of the implementation in this effort is to configure Knative for GPU node scheduling and then providing a tutorial that exposes how to create and reach an ISVC that is using a GPU
based on the spec, the implementation will be this way:
Note: this is the section from the spec on the chosen approach. For more details refer to spec KF084
.
- Set the required configurations by default, so that the user does not have to change them for Triton.
- Add a config option for registries-skip-tag-resolution because it is a possible use case for users to need to add to the list. As mentioned in the point above, the default for this config will be “nvcr.io” since we know that most users would not want tag resolution on the
nvcr.io
registry. The config.yaml will be:
options:
registries-skip-tag-resolution:
default: "nvcr.io"
description: Comma-seperated list of repositories for which tag to digest resolving should be skipped.
type: string
- Add a config option for progress-deadline with a default of “600s”, this way users don’t need to change it for Triton, but also have the option to change the timeout to suit their deployments. The option in the config.yaml will be:
options:
progress-deadline:
default: "600s"
description: the duration to wait for the deployment to be ready before considering it failed.
type: string
- Change the KnativeServing manifest to have the GPU Podspec settings enabled by default and to take in the config value both config options defined in the previous points :
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
name: {{ app_name }}
namespace: {{ serving_namespace }}
spec:
version: {{ serving_version }}
config:
deployment:
progress-deadline: {{ progress_deadline}}
registries-skipping-tag-resolving: {{ registries_skip_tag_resolving }}
features:
kubernetes.podspec-affinity: "enabled"
kubernetes.podspec-nodeselector: "enabled"
kubernetes.podspec-tolerations: "enabled"
Testing
progress-deadline
- deploy ISVC example with
<ISVC name>
- get the Deployment with the label
serving.kserve.io/inferenceservice: <ISVC name>
- assert in the
.spec
:progressDeadlineSeconds: <config value>
podspec-[nodeselector, affinity, tolerations]
- deploy ISVC example with
<ISVC name>
with nodeselector, affinity, and tolerations set, example:
kubectl apply -n kserve-test -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
nodeSelector:
myLabel1: "true"
tolerations:
- key: "myTaint1"
operator: "Equal"
value: "true"
effect: "NoSchedule"
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
- get the Deployment with the label
serving.kserve.io/inferenceservice: <ISVC name>
- assert in the Deployment
.spec
: onaffinity
,nodeSelector
,tolerations