Can't have worker nodes in miniconstellation cluster following tutorial (Failed to get IP in VPC)
revoltez opened this issue · 2 comments
Issue description
after running constellation mini up, i still dont have any worker node available, this is the logs i get:
kubectl logs -n kube-system daemonsets/join-service -f
{"level":"INFO","ts":"2023-12-25T15:10:49Z","caller":"cmd/main.go:57","msg":"Constellation Node Join Service","version":"v2.14.0","cloudProvider":"QEMU","attestationVariant":"qemu-vtpm"}
{"level":"INFO","ts":"2023-12-25T15:10:49Z","logger":"validator","caller":"watcher/validator.go:72","msg":"Updating expected measurements"}
{"level":"FATAL","ts":"2023-12-25T15:11:19Z","caller":"cmd/main.go:90","msg":"Failed to get IP in VPC","error":"Get \"http://10.42.0.1:8080/self\": context deadline exceeded"}
then i tried to check the pods and i found out that the join service crash looped
kube-system join-service-mkxdq 0/1 CrashLoopBackOff 22 (4m33s ago) 105m
so there are no worker nodes, only the control plane
so i deleted the join service pod and it restarted successfully however still no worker nodes joined
and here is the list of events kubectlt get events -A
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
kube-system 3m34s Warning FailedScheduling pod/cilium-operator-7f8f557b9d-fqnl2 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports..
kube-system 3m34s Warning FailedScheduling pod/coredns-8956f444c-x26r2 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules..
kube-system 39m Normal Pulled pod/join-service-lxdzj Container image "ghcr.io/edgelesssys/constellation/join-service:v2.14.0@sha256:c5cb0644b6c0519d0db1fd1e0986083e84b16c7bc90812669a7dc89aeb89ba4c" already present on machine
kube-system 4m28s Warning BackOff pod/join-service-lxdzj Back-off restarting failed container join-service in pod join-service-lxdzj_kube-system(bd1cd3a2-f203-4dca-9573-143cff075e51)
and here is the list of all pods:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cert-manager-6dfc87675f-jp95c 1/1 Running 0 76m
kube-system cert-manager-cainjector-79dd56cf68-874bb 1/1 Running 0 76m
kube-system cert-manager-webhook-7797df8bdb-cqfdd 1/1 Running 0 76m
kube-system cilium-operator-7f8f557b9d-fqnl2 0/1 Pending 0 76m
kube-system cilium-operator-7f8f557b9d-jtkdd 1/1 Running 0 76m
kube-system cilium-z8xzz 1/1 Running 0 76m
kube-system constellation-operator-controller-manager-85c66946c4-tbrbv 2/2 Running 0 70m
kube-system coredns-8956f444c-5lwwf 1/1 Running 0 76m
kube-system coredns-8956f444c-x26r2 0/1 Pending 0 76m
kube-system etcd-control-plane-0 1/1 Running 0 76m
kube-system join-service-lxdzj 0/1 CrashLoopBackOff 17 (69s ago) 76m
kube-system key-service-5ntc8 1/1 Running 0 76m
kube-system konnectivity-agent-qhbcb 1/1 Running 0 73m
kube-system kube-apiserver-control-plane-0 1/1 Running 0 76m
kube-system kube-controller-manager-control-plane-0 1/1 Running 0 76m
kube-system kube-scheduler-control-plane-0 1/1 Running 0 76m
kube-system node-maintenance-operator-controller-manager-5b6dcf6d8-dn422 1/1 Running 0 70m
kube-system verification-service-z4hp2 1/1 Running 0 76m
two of them is pending which are cilium-operator
and coredns
, dont know if that is relevant or not
intel-vx is enabled in bios and all pre-requisites are met
kubectl get nodes
output:
NAME STATUS ROLES AGE VERSION
control-plane-0 Ready control-plane 83m v1.27.8
ps: same result with qemu
Steps to reproduce the behavior
constellation mini up
in a new directory
Version
Version: v2.14.0 (Enterprise build; see documentation for license agreement)
GitCommit: facaa6a
GitTreeState: clean
BuildDate: 2023-12-19T07:37:24
GoVersion: go1.21.5
Compiler: bazel/gc
Platform: linux/amd64
Constellation Config
No response
Hello,
thanks for creating this issue. Indeed, I can reproduce this issue. While we fix this for the next release, here are steps you can take to fix this in your Constellation. Note that this only affects QEMU/Miniconstellation, so deploying Constellation on a CSP still works fine.
- Edit the ConfigMap
kube-system/ip-masq-agent
: Remove10.42.0.0/22
from thenonMasqueradeCIDRs
List.
e.g. via executing:
kubectl patch -n kube-system configmap ip-masq-agent --type merge -p '{"data":{"config": "{\"masqLinkLocal\":true,\"nonMasqueradeCIDRs\":[]}"}}'
- Restart the cilium daemonset.
e.g. via executing:
kubectl rollout restart -n kube-system daemonset cilium
Now the join service pod should eventually become healthy and the worker node should join.