[VirtualCluster] Error creating: failed to list services from cluster xxxxx cache: service is not ready
Closed this issue · 6 comments
What steps did you take and what happened:
I followed the vc demo doc here to create a virtual cluster.
But when I tried to create a pod in the vc. The pod is always pending.
Try to describe the pod in vc
kubectl --kubeconfig vc-1.kubeconfig describe po test-deploy-5fbd8f7c8-mnzhx
output:
Name: test-deploy-5fbd8f7c8-mnzhx
Namespace: default
Priority: 0
Service Account: default
Node: <none>
Labels: app=vc-test
pod-template-hash=5fbd8f7c8
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/test-deploy-5fbd8f7c8
Containers:
poc:
Image: busybox
Port: <none>
Host Port: <none>
Command:
top
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqgp2 (ro)
Volumes:
kube-api-access-qqgp2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 6s (x12 over 16s) vc-syncer Error creating: failed to list services from cluster default-25ca04-vc-sample-1 cache: service is not ready
What did you expect to happen:
The pod should be running.
Anything else you would like to add:
It seems that the error occurs here: https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/4cc1422fd340d19c0d21b4f336dc664e45a12954/virtualcluster/pkg/syncer/resources/pod/dws.go#L334
Why do we need to check services in pPod.Namespace
? There is apparently no services in that namespace unless I manually create one, because that namespace is just created by the syncer.
Is this a bug or is there anything I missed?
Environment:
- cluster-api-provider-nested version: none
- Minikube/KIND version: kind v0.16.0 go1.18.3 darwin/arm64
- Kubernetes version: (use
kubectl version
): 1.25.2 - OS (e.g. from
/etc/os-release
):
/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-nested/labels?q=area for the list of labels]
Hi, @LuBingtan. Which version of virtual cluster and root cluster are you running? And I did not get this part "There is apparently no services in that namespace". This doesn't make sense, it should have at least kubernetes.default.svc.
Hi, I have retried and found the root cause might be that the kubernetes.default.svc
failed to be synced.
Error logs:
E1108 04:32:15.064617 1 dws.go:65] failed reconcile service default/kubernetes CREATE of cluster default-3a3ae6-vc-sample-1 Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.32.0.1"}: must be empty when `clusterIP` is not specified
E1108 04:32:15.064652 1 mccontroller.go:476] default/kubernetes dws request reconcile failed: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.32.0.1"}: must be empty when `clusterIP` is not specified
It looks like ClusterIPs
should also be reset before creating.
https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/4cc1422fd340d19c0d21b4f336dc664e45a12954/virtualcluster/pkg/syncer/conversion/mutate.go#L422-L431
@wondywang What do you think? If this thought sounds ok, I can help to fix.
And FYI
- root cluster version: 1.25.2
serverVersion:
buildDate: "2022-09-22T05:28:27Z"
compiler: gc
gitCommit: 5835544ca568b757a8ecae5c153f317e5736700e
gitTreeState: clean
gitVersion: v1.25.2
goVersion: go1.19.1
major: "1"
minor: "25"
platform: linux/arm64
- virtual cluster version: v1.22.13
serverVersion:
buildDate: "2022-08-17T18:23:45Z"
compiler: gc
gitCommit: a43c0904d0de10f92aa3956c74489c45e6453d6e
gitTreeState: clean
gitVersion: v1.22.13
goVersion: go1.16.15
major: "1"
minor: "22"
platform: linux/arm64
thanks @LuBingtan
PTAL @Fei-Guo @christopherhein , It seems that it is indeed necessary to reset the ClusterIPs
here. And we already do that internally.
Yes, looks like a bug. @wondywang can you fix it?
Agreed, seems like a great catch. thanks!
Yes, looks like a bug. @wondywang can you fix it?
ok, i will