kubernetes-retired/cluster-api-provider-nested

[VirtualCluster] Error creating: failed to list services from cluster xxxxx cache: service is not ready

Closed this issue · 6 comments

What steps did you take and what happened:
I followed the vc demo doc here to create a virtual cluster.

But when I tried to create a pod in the vc. The pod is always pending.
Try to describe the pod in vc

kubectl --kubeconfig vc-1.kubeconfig describe po test-deploy-5fbd8f7c8-mnzhx

output:

Name:             test-deploy-5fbd8f7c8-mnzhx
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=vc-test
                  pod-template-hash=5fbd8f7c8
Annotations:      <none>
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/test-deploy-5fbd8f7c8
Containers:
  poc:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      top
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqgp2 (ro)
Volumes:
  kube-api-access-qqgp2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason        Age                From       Message
  ----     ------        ----               ----       -------
  Warning  FailedCreate  6s (x12 over 16s)  vc-syncer  Error creating: failed to list services from cluster default-25ca04-vc-sample-1 cache: service is not ready

What did you expect to happen:
The pod should be running.

Anything else you would like to add:

It seems that the error occurs here: https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/4cc1422fd340d19c0d21b4f336dc664e45a12954/virtualcluster/pkg/syncer/resources/pod/dws.go#L334

Why do we need to check services in pPod.Namespace ? There is apparently no services in that namespace unless I manually create one, because that namespace is just created by the syncer.

Is this a bug or is there anything I missed?

Environment:

  • cluster-api-provider-nested version: none
  • Minikube/KIND version: kind v0.16.0 go1.18.3 darwin/arm64
  • Kubernetes version: (use kubectl version): 1.25.2
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-nested/labels?q=area for the list of labels]

Hi, @LuBingtan. Which version of virtual cluster and root cluster are you running? And I did not get this part "There is apparently no services in that namespace". This doesn't make sense, it should have at least kubernetes.default.svc.

Hi, I have retried and found the root cause might be that the kubernetes.default.svc failed to be synced.
Error logs:

E1108 04:32:15.064617       1 dws.go:65] failed reconcile service default/kubernetes CREATE of cluster default-3a3ae6-vc-sample-1 Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.32.0.1"}: must be empty when `clusterIP` is not specified
E1108 04:32:15.064652       1 mccontroller.go:476] default/kubernetes dws request reconcile failed: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.32.0.1"}: must be empty when `clusterIP` is not specified

It looks like ClusterIPs should also be reset before creating.
https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/4cc1422fd340d19c0d21b4f336dc664e45a12954/virtualcluster/pkg/syncer/conversion/mutate.go#L422-L431

@wondywang What do you think? If this thought sounds ok, I can help to fix.

And FYI

  • root cluster version: 1.25.2
serverVersion:
  buildDate: "2022-09-22T05:28:27Z"
  compiler: gc
  gitCommit: 5835544ca568b757a8ecae5c153f317e5736700e
  gitTreeState: clean
  gitVersion: v1.25.2
  goVersion: go1.19.1
  major: "1"
  minor: "25"
  platform: linux/arm64
  • virtual cluster version: v1.22.13
serverVersion:
  buildDate: "2022-08-17T18:23:45Z"
  compiler: gc
  gitCommit: a43c0904d0de10f92aa3956c74489c45e6453d6e
  gitTreeState: clean
  gitVersion: v1.22.13
  goVersion: go1.16.15
  major: "1"
  minor: "22"
  platform: linux/arm64

thanks @LuBingtan

PTAL @Fei-Guo @christopherhein , It seems that it is indeed necessary to reset the ClusterIPs here. And we already do that internally.

Yes, looks like a bug. @wondywang can you fix it?

Agreed, seems like a great catch. thanks!

Yes, looks like a bug. @wondywang can you fix it?

ok, i will