canonical/seldon-core-operator

juju refresh doesn't update seldon-webhook-service definition

Closed this issue · 3 comments

Found during testing canonical/bundle-kubeflow#462.

The webhook label fix in #14 is only applied for a green field deployment with the new revision of the charm. If it's an upgrade scenario, there is no definition update to happen so the original issue is still there.

$ juju info seldon-core
name: seldon-core
charm-id: ZGHtHpN4TqAzrUlh9aG1SWxXenopHFRH
...
channels: |
  latest/stable:     52  2022-01-25  (52)  1MB
  latest/candidate:  ↑
  latest/beta:       ↑
  latest/edge:       58  2022-06-01  (58)  7MB

[stable: revision 52]

$ juju deploy seldon-core seldon-controller-manager
Located charm "seldon-core" in charm-hub, revision 52
Deploying "seldon-controller-manager" from charm-hub charm "seldon-core", revision 52 in channel stable on focal

$ microk8s kubectl -n kubeflow describe service/seldon-webhook-service
Name:              seldon-webhook-service
Namespace:         kubeflow
Labels:            app=seldon
                   app.juju.is/created-by=seldon-controller-manager
                   app.kubernetes.io/instance=seldon-core
                   app.kubernetes.io/version=1.9.0
Annotations:       <none>
Selector:          app.kubernetes.io/name=seldon-controller-manager,control-plane=seldon-controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.152.183.28
IPs:               10.152.183.28
Port:              <unset>  4443/TCP
TargetPort:        4443/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

-> with control-plane=seldon-controller-manager

[upgrade from 52 to 58]

$ juju refresh seldon-controller-manager --channel edge
Added charm-hub charm "seldon-core", revision 58 in channel edge, to the model
Adding endpoint "grafana-dashboard" to default space "alpha"
Adding endpoint "metrics-endpoint" to default space "alpha"
Leaving endpoints in "alpha": ambassador, istio, keda

$ microk8s kubectl -n kubeflow describe service/seldon-webhook-service
Name:              seldon-webhook-service
Namespace:         kubeflow
Labels:            app=seldon
                   app.juju.is/created-by=seldon-controller-manager
                   app.kubernetes.io/instance=seldon-core
                   app.kubernetes.io/version=1.9.0
Annotations:       <none>
Selector:          app.kubernetes.io/name=seldon-controller-manager,control-plane=seldon-controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.152.183.28
IPs:               10.152.183.28
Port:              <unset>  4443/TCP
TargetPort:        4443/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

-> control-plane=seldon-controller-manager is still there even after the refresh.

[edge: revision 58]

$ juju deploy seldon-core seldon-controller-manager --channel edge
Located charm "seldon-core" in charm-hub, revision 58
Deploying "seldon-controller-manager" from charm-hub charm "seldon-core", revision 58 in channel edge on focal

$ microk8s kubectl -n kubeflow describe service/seldon-webhook-service 
Name:              seldon-webhook-service
Namespace:         kubeflow
Labels:            app=seldon
                   app.juju.is/created-by=seldon-controller-manager
                   app.kubernetes.io/instance=seldon-core
                   app.kubernetes.io/version=1.9.0
Annotations:       <none>
Selector:          app.kubernetes.io/name=seldon-controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.152.183.84
IPs:               10.152.183.84
Port:              <unset>  4443/TCP
TargetPort:        4443/TCP
Endpoints:         10.1.60.16:4443
Session Affinity:  None
Events:            <none>

-> as expected, there is no control-plane=seldon-controller-manager.

workaround:

$ microk8s kubectl -n kubeflow patch service/seldon-webhook-service --type=json \
    -p='[{"op": "remove", "path": "/spec/selector/control-plane"}]'

I think this service gets deployed by the seldon operator, and perhaps if it sees the service exists already it does not try to overwrite it? Or it might even be that the new operator deployment doesn't even own the service and cannot edit it... We should check the seldon operator pod logs.

In any case, this needs further investigation. First step should be making an integration test for upgrade that would have caught this.

Seldon pod logs on refresh: https://pastebin.canonical.com/p/YZH2FY6DpT/
It was unrelated.

This issue is not relevant in context of 1.6 to 1.7 upgrade or 1.7 greenfield or 1.6 greenfield deployment. It deals with KF v1.4.
Won't fix.