k8s-proxmox/cloud-provider-proxmox

No error message in log when ccm does not set the Provider ID?

Closed this issue · 2 comments

Hi @sp-yduck,

first really nice project(s) i follow every commit via e-mail in your proxmox project(s)🤓.
At the moment i just use your software i also try to dip into golang and especially into your code but atm i do not have the time/motivation to learn so much new stuff.

Today i tried to get up and running your newest version but i have to problem that the ccm do not set the providerID in the node.spec.providerID. Even with CNI and CCM installed. I tried different CNIs (i don't get cilium to work "Load overlay network failed" error="program cil_from_overlay: replacing clsact qdisc for interface cilium_vxlan: operation not supported") but calico (good old friend) worked.

Node is healthy, all pods are up (coredns not because of the node.cloudprovider.kubernetes.io/uninitialized taint after i remove the taint manually coredns also came up) and the log of the ccm also looks okay but i the providerID under node.spec. is although missing.

Below are some snippets of my setup.
I would assume a log message in the ccm log if the ccm is not able to set the providerID (or remove the taint).

Is this maybe a hint?

$ k --kubeconfig=kubeconfig.yaml logs -n kube-system kube-controller-manager-cappx-test-controlplane-gj5b9 | grep err
I0802 19:09:40.753236       1 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="controllerrevisions.apps"
I0802 19:09:50.872722       1 controllermanager.go:638] "Started controller" controller="clusterrole-aggregation"
I0802 19:09:50.872868       1 clusterroleaggregation_controller.go:189] "Starting ClusterRoleAggregator controller"
I0802 19:09:50.888903       1 actual_state_of_world.go:547] "Failed to update statusUpdateNeeded field in actual state of world" err="Failed to set statusUpdateNeeded to needed true, because nodeName=\"cappx-test-controlplane-gj5b9\" does not exist"

Some snippets of the status and logs of the cluster

$ k --kubeconfig=kubeconfig.yaml get pods -A
NAMESPACE     NAME                                                    READY   STATUS    RESTARTS       AGE
kube-system   calico-kube-controllers-68df4c59b7-przqw                1/1     Running   0              2m56s
kube-system   calico-node-l25bf                                       1/1     Running   0              5m59s
kube-system   coredns-5d78c9869d-lqhp6                                0/1     Pending   0              10m
kube-system   coredns-5d78c9869d-pw2q8                                0/1     Pending   0              10m
kube-system   etcd-cappx-test-controlplane-gj5b9                      1/1     Running   1 (11m ago)    11m
kube-system   kube-apiserver-cappx-test-controlplane-gj5b9            1/1     Running   1 (11m ago)    11m
kube-system   kube-controller-manager-cappx-test-controlplane-gj5b9   1/1     Running   2 (5m7s ago)   11m
kube-system   kube-proxy-jp476                                        1/1     Running   0              10m
kube-system   kube-scheduler-cappx-test-controlplane-gj5b9            1/1     Running   2 (5m7s ago)   11m
kube-system   kube-vip-cappx-test-controlplane-gj5b9                  1/1     Running   2 (5m8s ago)   11m

$ k --kubeconfig=kubeconfig.yaml get node cappx-test-controlplane-gj5b9 -o=jsonpath={.spec}
{"podCIDR":"10.244.0.0/24","podCIDRs":["10.244.0.0/24"],"taints":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane"},{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"}]}

$ k --kubeconfig=kubeconfig.yaml get node
NAME                            STATUS   ROLES           AGE   VERSION
cappx-test-controlplane-gj5b9   Ready    control-plane   18m   v1.27.3

$ k --kubeconfig=kubeconfig.yaml get events
LAST SEEN   TYPE      REASON                    OBJECT                               MESSAGE
20m         Normal    Starting                  node/cappx-test-controlplane-gj5b9   Starting kubelet.
20m         Warning   InvalidDiskCapacity       node/cappx-test-controlplane-gj5b9   invalid capacity 0 on image filesystem
20m         Normal    NodeAllocatableEnforced   node/cappx-test-controlplane-gj5b9   Updated Node Allocatable limit across pods
20m         Normal    NodeHasSufficientMemory   node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 status is now: NodeHasSufficientMemory
20m         Normal    NodeHasNoDiskPressure     node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 status is now: NodeHasNoDiskPressure
20m         Normal    NodeHasSufficientPID      node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 status is now: NodeHasSufficientPID
19m         Normal    Starting                  node/cappx-test-controlplane-gj5b9   Starting kubelet.
19m         Warning   InvalidDiskCapacity       node/cappx-test-controlplane-gj5b9   invalid capacity 0 on image filesystem
19m         Normal    NodeHasSufficientMemory   node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 status is now: NodeHasSufficientMemory
19m         Normal    NodeHasNoDiskPressure     node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 status is now: NodeHasNoDiskPressure
19m         Normal    NodeHasSufficientPID      node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 status is now: NodeHasSufficientPID
19m         Normal    NodeAllocatableEnforced   node/cappx-test-controlplane-gj5b9   Updated Node Allocatable limit across pods
19m         Normal    RegisteredNode            node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 event: Registered Node cappx-test-controlplane-gj5b9 in Controller
19m         Normal    Starting                  node/cappx-test-controlplane-gj5b9
13m         Normal    NodeReady                 node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 status is now: NodeReady
12m         Normal    RegisteredNode            node/cappx-test-controlplane-gj5b9   Node cappx-test-controlplane-gj5b9 event: Registered Node cappx-test-controlplane-gj5b9 in Controller

I would appreciate your reply very much!

it seems you haven't deployed CCM(Cloud Controller Manager) ? that log is kube-controller-manager
make sure to install ccm (https://github.com/sp-yduck/cloud-provider-proxmox/blob/master/manifests/cloud-controller-manager.yaml#L26)

damn.... 🤐
thank you!
Then i will have a look into the ClusterResourceSet it seems like the resources aren't applied.
Feel free to close the issue otherwise i will post a update on it.