Can't have worker nodes in miniconstellation cluster following tutorial (Failed to get IP in VPC)

Question

Can't have worker nodes in miniconstellation cluster following tutorial (Failed to get IP in VPC)

revoltez opened this issue 9 months ago · 2 comments

Issue description

after running constellation mini up, i still dont have any worker node available, this is the logs i get:

kubectl logs -n kube-system daemonsets/join-service -f

{"level":"INFO","ts":"2023-12-25T15:10:49Z","caller":"cmd/main.go:57","msg":"Constellation Node Join Service","version":"v2.14.0","cloudProvider":"QEMU","attestationVariant":"qemu-vtpm"}
{"level":"INFO","ts":"2023-12-25T15:10:49Z","logger":"validator","caller":"watcher/validator.go:72","msg":"Updating expected measurements"}
{"level":"FATAL","ts":"2023-12-25T15:11:19Z","caller":"cmd/main.go:90","msg":"Failed to get IP in VPC","error":"Get \"http://10.42.0.1:8080/self\": context deadline exceeded"}

then i tried to check the pods and i found out that the join service crash looped

kube-system   join-service-mkxdq  0/1     CrashLoopBackOff   22 (4m33s ago)   105m

so there are no worker nodes, only the control plane

so i deleted the join service pod and it restarted successfully however still no worker nodes joined

and here is the list of events kubectlt get events -A

NAMESPACE     LAST SEEN   TYPE      REASON             OBJECT                                 MESSAGE
kube-system   3m34s       Warning   FailedScheduling   pod/cilium-operator-7f8f557b9d-fqnl2   0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports..
kube-system   3m34s       Warning   FailedScheduling   pod/coredns-8956f444c-x26r2            0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules..
kube-system   39m         Normal    Pulled             pod/join-service-lxdzj                 Container image "ghcr.io/edgelesssys/constellation/join-service:v2.14.0@sha256:c5cb0644b6c0519d0db1fd1e0986083e84b16c7bc90812669a7dc89aeb89ba4c" already present on machine
kube-system   4m28s       Warning   BackOff            pod/join-service-lxdzj                 Back-off restarting failed container join-service in pod join-service-lxdzj_kube-system(bd1cd3a2-f203-4dca-9573-143cff075e51)

and here is the list of all pods:

NAMESPACE     NAME                                                           READY   STATUS             RESTARTS       AGE
kube-system   cert-manager-6dfc87675f-jp95c                                  1/1     Running            0              76m
kube-system   cert-manager-cainjector-79dd56cf68-874bb                       1/1     Running            0              76m
kube-system   cert-manager-webhook-7797df8bdb-cqfdd                          1/1     Running            0              76m
kube-system   cilium-operator-7f8f557b9d-fqnl2                               0/1     Pending            0              76m
kube-system   cilium-operator-7f8f557b9d-jtkdd                               1/1     Running            0              76m
kube-system   cilium-z8xzz                                                   1/1     Running            0              76m
kube-system   constellation-operator-controller-manager-85c66946c4-tbrbv     2/2     Running            0              70m
kube-system   coredns-8956f444c-5lwwf                                        1/1     Running            0              76m
kube-system   coredns-8956f444c-x26r2                                        0/1     Pending            0              76m
kube-system   etcd-control-plane-0                                           1/1     Running            0              76m
kube-system   join-service-lxdzj                                             0/1     CrashLoopBackOff   17 (69s ago)   76m
kube-system   key-service-5ntc8                                              1/1     Running            0              76m
kube-system   konnectivity-agent-qhbcb                                       1/1     Running            0              73m
kube-system   kube-apiserver-control-plane-0                                 1/1     Running            0              76m
kube-system   kube-controller-manager-control-plane-0                        1/1     Running            0              76m
kube-system   kube-scheduler-control-plane-0                                 1/1     Running            0              76m
kube-system   node-maintenance-operator-controller-manager-5b6dcf6d8-dn422   1/1     Running            0              70m
kube-system   verification-service-z4hp2                                     1/1     Running            0              76m

two of them is pending which are cilium-operator and coredns, dont know if that is relevant or not

intel-vx is enabled in bios and all pre-requisites are met

kubectl get nodes output:

NAME              STATUS   ROLES           AGE   VERSION
control-plane-0   Ready    control-plane   83m   v1.27.8

ps: same result with qemu

Steps to reproduce the behavior

constellation mini up in a new directory

Version

Version: v2.14.0 (Enterprise build; see documentation for license agreement)
GitCommit: facaa6a
GitTreeState: clean
BuildDate: 2023-12-19T07:37:24
GoVersion: go1.21.5
Compiler: bazel/gc
Platform: linux/amd64

Constellation Config

No response

Answer 1 · 2023-12-28T17:28:40.000Z

Hello,

thanks for creating this issue. Indeed, I can reproduce this issue. While we fix this for the next release, here are steps you can take to fix this in your Constellation. Note that this only affects QEMU/Miniconstellation, so deploying Constellation on a CSP still works fine.

Edit the ConfigMap kube-system/ip-masq-agent: Remove 10.42.0.0/22 from the nonMasqueradeCIDRs List.
e.g. via executing:

kubectl patch -n kube-system configmap ip-masq-agent --type merge -p '{"data":{"config": "{\"masqLinkLocal\":true,\"nonMasqueradeCIDRs\":[]}"}}'

Restart the cilium daemonset.
e.g. via executing:

kubectl rollout  restart -n kube-system daemonset cilium

Now the join service pod should eventually become healthy and the worker node should join.

Answer 2 · 2023-12-28T17:44:42.000Z

@3u13r thank you so much sir, its working now (best new year gift so far)