SeldonIO/seldon-core

Local kind deployment fails due to HorizontalPodAutoscaler

creativedutchmen opened this issue ยท 19 comments

Describe the bug

When applying an example SeldonDeployment in a fresh kind cluster, the deployment doesn't come online because of the following error no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta1".

Looking at the installed versions for the scaler, this version is indeed not present:

$ kubectl api-versions | grep autoscaling
autoscaling/v1
autoscaling/v2
autoscaling/v2beta2

To reproduce

Follow instructions in the Quickstart, then deploy a model

Expected behaviour

The deployment doesn't throw an error and creates a working endpoint

Environment

Local installation on a M1 Mac
Kubernetes 1.25.0
seldon helm chart from https://storage.googleapis.com/seldon-charts
value: docker.io/seldonio/seldon-core-executor:1.14.1
image: docker.io/seldonio/seldon-core-operator:1.14.1

Model Details

Model deployment:

kind: SeldonDeployment
metadata:
  name: iris-model
  namespace: modelapis
spec:
  name: iris
  predictors:
  - graph:
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/v1.15.0-dev/sklearn/iris
      name: classifier
    name: default
    replicas: 2

This PR looks to fix this but would need further work to allow use to be backwards compatible with v2beta1

Is there any workaround (use k8s < 1.25) or do we need to wait for the fix?

For now you would need to use k8s < 1.25 if you wanted to use HPA. You could try KEDA instead which would create HPAs itself and allow you to use 1.25+ now.

Our Kubernetes cluster will be upgraded to v1.25 in March 2023. Is there a realistic chance that Seldon Core 1.16 will be released before that? Otherwise we would need to patch it somehow ourselves. Your response will be very helpful for our planning. Thanks a lot!

At present, I don't think we can commit to that timeline as we need to determine when we do this update if we just go for HPA v1 as this also raises the minimum k8s version to 1.23.

Thanks a lot for your quick reply. Good to know. Sounds like we could go for a temporary patch using HPA v1 ourselves until Seldon Core 1.16 will be released?

For now you would need to use k8s < 1.25 if you wanted to use HPA. You could try KEDA instead which would create HPAs itself and allow you to use 1.25+ now.

Hi @cliveseldon,
Using Keda does not seem like an option as even simple SeldonDeployments that don't use HPA fail with a runtime error on 1.25. I tried the sklearn iris example.

$ kubectl apply -f https://raw.githubusercontent.com/SeldonIO/seldoncore/master/examples/models/sklearn_iris/sklearn_iris_deployment.yaml -n seldon

seldondeployment.machinelearning.seldon.io/seldon-deployment-example created 


โฏ kubectl get sdep -n seldon seldon-deployment-example -oyaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"machinelearning.seldon.io/v1alpha2","kind":"SeldonDeployment","metadata":{"annotations":{},"name":"seldon-deployment-example","namespace":"seldon"},"spec":{"name":"sklearn-iris-deployment","predictors":[{"componentSpecs":[{"spec":{"containers":[{"image":"seldonio/sklearn-iris:0.3","imagePullPolicy":"IfNotPresent","name":"sklearn-iris-classifier"}]}}],"graph":{"children":[],"endpoint":{"type":"REST"},"name":"sklearn-iris-classifier","type":"MODEL"},"name":"sklearn-iris-predictor","replicas":1}]}}
  creationTimestamp: "2022-12-28T10:34:01Z"
  generation: 1
  name: seldon-deployment-example
  namespace: seldon
  resourceVersion: "5682"
  uid: 5162ae75-5351-49e3-b308-0adf8aac9fde
spec:
  name: sklearn-iris-deployment
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/sklearn-iris:0.3
          imagePullPolicy: IfNotPresent
          name: sklearn-iris-classifier
    graph:
      children: []
      endpoint:
        type: REST
      name: sklearn-iris-classifier
      type: MODEL
    name: sklearn-iris-predictor
    replicas: 1
status:
  address:
    url: http://seldon-deployment-example-sklearn-iris-predictor.seldon.svc.cluster.local:8000/api/v1.0/predictions
  conditions:
  - lastTransitionTime: "2022-12-28T10:34:01Z"
    reason: Not all services created
    status: "False"
    type: Ready
  - lastTransitionTime: "2022-12-28T10:34:01Z"
    reason: Not all services created
    status: "False"
    type: ServicesReady
  description: no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta1"
  state: Failed

The seldon controller pod has the below error:

{"level":"error","ts":1672223980.172152,"logger":"controller.seldon-controller-manager","msg":"Reconciler error","reconciler group":"machinelearning.seldon.io","reconciler kind":"SeldonDeployment","name":"seldon-deployment-example","namespace":"seldon","error":"no matches for kind \"HorizontalPodAutoscaler\" in version \"autoscaling/v2beta1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"}

Just release it, people on older k8s can use the older Seldon build. Right now your making it so people on newer K8s cannot use Seldon, which is much worse.

The PR is being worked on and should be in next release

@cliveseldon Great news. Thanks a lot. Can you spoiler a release date already?

At present sometime before end of March I would hope but maybe earlier if we can get all V1 PRs in and update docs for upgrading goes smoothly.

@cliveseldon
Perfect. I think we can wait on your release then and don't need an own workaround. Thanks again.

Looking at #4172 I believe the scope is pretty big. I think it is possible to make a smaller patch to only use autoscaling/v2 when calling Kubernetes API with changes limited to seldondeployment_controller.go. Would you be interested in that? If yes, then I can work on a PR. If it works out, then, perhaps, you can do a patch release. @cliveseldon

@NovemberZulu For this idea would it be to use autoscalingv2beta1.MetricSpec in the SeldonDeployment and translate to a v2.MetricSpec before sending over the wire?

If so the issue would be if v2.MetricSpec allowed for functionality to be expressed that can't be expressed in autoscalingv2beta1.MetricSpec.

@cliveseldon "sending over the wire" as in "calling Kubernetes API", yes, that's the idea.
You are absolutely right that it's not possible to fully convert autocalingv2.MetricSpec to autoscalingv2beta1.MetricSpec, but looking at the code I believe what we only need to convert autoscalingv2beta1.MetricSpec to autocalingv2.MetricSpec, and that should be doable.
Of course, it is very possible that I miss something important, please correct me if I am wrong. Thanks!

It would be great if there were a no-HPA fix for this asap, for those who must support kube 1.25 but could do without HPA. Waiting to fix this only when HPA is fully figured out would impact such projects.

Might this be a use-case for a Conversion Webhook? https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#write-a-conversion-webhook-server

At present sometime before end of March I would hope but maybe earlier if we can get all V1 PRs in and update docs for upgrading goes smoothly.

Hi @cliveseldon - Following up on this, would you be able to provide a tentative release date?

The PR is complete and being reviewed. Once in we will triage the next release.

The PR is complete and being reviewed. Once in we will triage the next release.

Thanks. We're waiting for the release before we can move to production with Seldon