edgelesssys/constellation

Can't have worker nodes in miniconstellation cluster following tutorial (Failed to get IP in VPC)

revoltez opened this issue · 2 comments

Issue description

after running constellation mini up, i still dont have any worker node available, this is the logs i get:

kubectl logs -n kube-system daemonsets/join-service -f

{"level":"INFO","ts":"2023-12-25T15:10:49Z","caller":"cmd/main.go:57","msg":"Constellation Node Join Service","version":"v2.14.0","cloudProvider":"QEMU","attestationVariant":"qemu-vtpm"}
{"level":"INFO","ts":"2023-12-25T15:10:49Z","logger":"validator","caller":"watcher/validator.go:72","msg":"Updating expected measurements"}
{"level":"FATAL","ts":"2023-12-25T15:11:19Z","caller":"cmd/main.go:90","msg":"Failed to get IP in VPC","error":"Get \"http://10.42.0.1:8080/self\": context deadline exceeded"}

then i tried to check the pods and i found out that the join service crash looped

kube-system   join-service-mkxdq  0/1     CrashLoopBackOff   22 (4m33s ago)   105m

so there are no worker nodes, only the control plane

so i deleted the join service pod and it restarted successfully however still no worker nodes joined

and here is the list of events kubectlt get events -A

NAMESPACE     LAST SEEN   TYPE      REASON             OBJECT                                 MESSAGE
kube-system   3m34s       Warning   FailedScheduling   pod/cilium-operator-7f8f557b9d-fqnl2   0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports..
kube-system   3m34s       Warning   FailedScheduling   pod/coredns-8956f444c-x26r2            0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules..
kube-system   39m         Normal    Pulled             pod/join-service-lxdzj                 Container image "ghcr.io/edgelesssys/constellation/join-service:v2.14.0@sha256:c5cb0644b6c0519d0db1fd1e0986083e84b16c7bc90812669a7dc89aeb89ba4c" already present on machine
kube-system   4m28s       Warning   BackOff            pod/join-service-lxdzj                 Back-off restarting failed container join-service in pod join-service-lxdzj_kube-system(bd1cd3a2-f203-4dca-9573-143cff075e51)

and here is the list of all pods:

NAMESPACE     NAME                                                           READY   STATUS             RESTARTS       AGE
kube-system   cert-manager-6dfc87675f-jp95c                                  1/1     Running            0              76m
kube-system   cert-manager-cainjector-79dd56cf68-874bb                       1/1     Running            0              76m
kube-system   cert-manager-webhook-7797df8bdb-cqfdd                          1/1     Running            0              76m
kube-system   cilium-operator-7f8f557b9d-fqnl2                               0/1     Pending            0              76m
kube-system   cilium-operator-7f8f557b9d-jtkdd                               1/1     Running            0              76m
kube-system   cilium-z8xzz                                                   1/1     Running            0              76m
kube-system   constellation-operator-controller-manager-85c66946c4-tbrbv     2/2     Running            0              70m
kube-system   coredns-8956f444c-5lwwf                                        1/1     Running            0              76m
kube-system   coredns-8956f444c-x26r2                                        0/1     Pending            0              76m
kube-system   etcd-control-plane-0                                           1/1     Running            0              76m
kube-system   join-service-lxdzj                                             0/1     CrashLoopBackOff   17 (69s ago)   76m
kube-system   key-service-5ntc8                                              1/1     Running            0              76m
kube-system   konnectivity-agent-qhbcb                                       1/1     Running            0              73m
kube-system   kube-apiserver-control-plane-0                                 1/1     Running            0              76m
kube-system   kube-controller-manager-control-plane-0                        1/1     Running            0              76m
kube-system   kube-scheduler-control-plane-0                                 1/1     Running            0              76m
kube-system   node-maintenance-operator-controller-manager-5b6dcf6d8-dn422   1/1     Running            0              70m
kube-system   verification-service-z4hp2                                     1/1     Running            0              76m

two of them is pending which are cilium-operator and coredns, dont know if that is relevant or not

intel-vx is enabled in bios and all pre-requisites are met

kubectl get nodes output:

NAME              STATUS   ROLES           AGE   VERSION
control-plane-0   Ready    control-plane   83m   v1.27.8

ps: same result with qemu

Steps to reproduce the behavior

constellation mini up in a new directory

Version

Version: v2.14.0 (Enterprise build; see documentation for license agreement)
GitCommit: facaa6a
GitTreeState: clean
BuildDate: 2023-12-19T07:37:24
GoVersion: go1.21.5
Compiler: bazel/gc
Platform: linux/amd64

Constellation Config

No response

Hello,

thanks for creating this issue. Indeed, I can reproduce this issue. While we fix this for the next release, here are steps you can take to fix this in your Constellation. Note that this only affects QEMU/Miniconstellation, so deploying Constellation on a CSP still works fine.

  1. Edit the ConfigMap kube-system/ip-masq-agent: Remove 10.42.0.0/22 from the nonMasqueradeCIDRs List.
    e.g. via executing:
kubectl patch -n kube-system configmap ip-masq-agent --type merge -p '{"data":{"config": "{\"masqLinkLocal\":true,\"nonMasqueradeCIDRs\":[]}"}}'
  1. Restart the cilium daemonset.
    e.g. via executing:
kubectl rollout  restart -n kube-system daemonset cilium

Now the join service pod should eventually become healthy and the worker node should join.

@3u13r thank you so much sir, its working now (best new year gift so far)