Error deploying the SeldonDeployment in namespace other than kubeflow
Closed this issue · 3 comments
Trying to deploy a SeldonDeployment:
$ microk8s kubectl apply -f model-mlflow-local.yaml -n kubeflow
seldondeployment.machinelearning.seldon.io/mlflow created
$ microk8s kubectl apply -f model-mlflow-local.yaml -n admin
Error from server (InternalError): error when creating "model-mlflow-local.yaml": Internal error occurred: failed calling webhook "v1alpha2.vseldondeployment.kb.io": Post "https://seldon-webhook-service.kubeflow.svc:4443/validate-machinelearning-seldon-io-v1alpha2-seldondeployment?timeout=30s": dial tcp 10.152.183.150:4443: connect: connection refused
I have the same secret in both of the namespaces:
$ microk8s kubectl get secrets -A | grep seldon-init-container-secret
admin seldon-init-container-secret Opaque 6 114m
kubeflow seldon-init-container-secret Opaque 6 114m
yaml file (requires changing the modelUri)
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: mlflow
spec:
name: wines
predictors:
- componentSpecs:
- spec:
containers:
- name: classifier
livenessProbe:
initialDelaySeconds: 80
failureThreshold: 200
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
readinessProbe:
initialDelaySeconds: 80
failureThreshold: 200
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
graph:
children: []
implementation: MLFLOW_SERVER
modelUri: s3://mlflow/71/8476095066fd43af8ae2a6f1511044df/artifacts/model
envSecretRefName: seldon-init-container-secret
name: classifier
name: wine-super-model
replicas: 1
Deployment was done using kubeflow-lite bundle + mlflow (stable channel). RBAC is enabled on microk8s.
I think this is because we haven't added these permissions to the aggregate roles applied to each Kubeflow user. We had a similar problem for other components (Katib, kfp, ...) when moving to multi-user and solved it with the kubeflow-roles-operator. Likely we can fix this the same way.
Nevermind, my first guess was completely wrong.
Instead I believe the problem might be that for namespaces with the label serving.kubeflow.org/inferenceservice: enabled
, they trigger a webhook that goes through the seldon-webhook-service
and one of that webhook's selector
s is control-plane: seldon-controller-manager
, but the pod providing the webhook does not have that label. Looking into what we should be labelling things now, but as a quick fix either of these works:
- edit the
seldon-webhook-service
and delete that control-plane label - remove the
serving.kubeflow.org/inferenceservice: enabled
label from theadmin
namespace
Looking into what the webhook is supposed to actually do, etc, to see how we should configure things to fix permanently.
Turns out this is actually fixed in edge by #14, but not pushed to stable. I'll close this but if you hit it again please reopen