dajudge/kindcontainer

Using the kindcontainer it fails on startup

jonathanvila opened this issue · 5 comments

Using Fefora 35
When it tries to start the Kind cluster if fails with :

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

	Unfortunately, an error has occurred:
		timed out waiting for the condition

	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'

	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI.

	Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
		- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'

, stderr=I0104 14:15:14.794963      80 initconfiguration.go:246] loading configuration from "/kindcontainer/kubeadmConfig.yaml"
I0104 14:15:14.804300      80 certs.go:110] creating a new certificate authority for ca
I0104 14:15:15.041926      80 certs.go:487] validating certificate period for ca certificate
I0104 14:15:15.411537      80 certs.go:110] creating a new certificate authority for front-proxy-ca
I0104 14:15:15.596737      80 certs.go:487] validating certificate period for front-proxy-ca certificate
I0104 14:15:15.672613      80 certs.go:110] creating a new certificate authority for etcd-ca
I0104 14:15:15.855866      80 certs.go:487] validating certificate period for etcd/ca certificate
I0104 14:15:16.579837      80 certs.go:76] creating new public/private key files for signing service account users
I0104 14:15:16.847038      80 kubeconfig.go:101] creating kubeconfig file for admin.conf
I0104 14:15:16.918423      80 kubeconfig.go:101] creating kubeconfig file for kubelet.conf
I0104 14:15:17.086264      80 kubeconfig.go:101] creating kubeconfig file for controller-manager.conf
I0104 14:15:17.155503      80 kubeconfig.go:101] creating kubeconfig file for scheduler.conf
I0104 14:15:17.217075      80 kubelet.go:63] Stopping the kubelet
I0104 14:15:17.281208      80 manifests.go:96] [control-plane] getting StaticPodSpecs
I0104 14:15:17.281532      80 certs.go:487] validating certificate period for CA certificate
I0104 14:15:17.281616      80 manifests.go:109] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0104 14:15:17.281624      80 manifests.go:109] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0104 14:15:17.281628      80 manifests.go:109] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0104 14:15:17.281633      80 manifests.go:109] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0104 14:15:17.281637      80 manifests.go:109] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0104 14:15:17.287395      80 manifests.go:126] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0104 14:15:17.287416      80 manifests.go:96] [control-plane] getting StaticPodSpecs
I0104 14:15:17.287631      80 manifests.go:109] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0104 14:15:17.287638      80 manifests.go:109] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0104 14:15:17.287641      80 manifests.go:109] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0104 14:15:17.287645      80 manifests.go:109] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0104 14:15:17.287648      80 manifests.go:109] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0104 14:15:17.287652      80 manifests.go:109] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0104 14:15:17.287654      80 manifests.go:109] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0104 14:15:17.288703      80 manifests.go:126] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0104 14:15:17.288725      80 manifests.go:96] [control-plane] getting StaticPodSpecs
I0104 14:15:17.288984      80 manifests.go:109] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0104 14:15:17.289503      80 manifests.go:126] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
I0104 14:15:17.290149      80 local.go:74] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0104 14:15:17.290168      80 waitcontrolplane.go:87] [wait-control-plane] Waiting for the API server to be healthy
I0104 14:15:17.290802      80 loader.go:372] Config loaded from file:  /etc/kubernetes/admin.conf
I0104 14:15:17.292935      80 round_trippers.go:454] GET https://172.17.0.3:6443/healthz?timeout=10s  in 0 milliseconds
I0104 14:15:17.793941      80 round_trippers.go:454] GET https://172.17.0.3:6443/healthz?timeout=10s  in 0 milliseconds
I0104 14:15:18.294133      80 round_trippers.go:454] GET https://172.17.0.3:6443/healthz?timeout=10s  in 0 milliseconds
I0104 14:15:18.794382      80 round_trippers.go:454] GET https://172.17.0.3:6443/healthz?timeout=10s  in 0 milliseconds
......

In order to isolate the problem I've tried to execute it on the command line and I'm not sure if its running .
command :
podman run --tmpfs /run --tmpfs /tmp --privileged -e KUBECONFIG=/etc/kubernetes/admin.conf -v /var/lib/containerd --entrypoint "/usr/local/bin/entrypoint" --expose=6443 kindest/node:v1.21.1 /sbin/init

log :

✔ docker.io/kindest/node:v1.21.1
Trying to pull docker.io/kindest/node:v1.21.1...
Getting image source signatures
Copying blob edf0ee5bab4f done  
Copying blob 7a1ddb7455a2 done  
Copying config 65d38077cb done  
Writing manifest to image destination
Storing signatures
INFO: running in a user namespace (experimental)
INFO: UserNS: faking /proc/sys/vm/overcommit_memory to be "1" (writable)
INFO: UserNS: faking /proc/sys/vm/panic_on_oom to be "0" (writable)
INFO: UserNS: faking /proc/sys/kernel/panic to be "10" (writable)
INFO: UserNS: faking /proc/sys/kernel/panic_on_oops to be "1" (writable)
INFO: UserNS: faking /proc/sys/kernel/keys/root_maxkeys to be "1000000" (writable)
INFO: UserNS: faking /proc/sys/kernel/keys/root_maxbytes to be "25000000" (writable)
INFO: changing snapshotter from "overlayfs" to "fuse-overlayfs"
INFO: enabling containerd-fuse-overlayfs service
Created symlink /etc/systemd/system/multi-user.target.wants/containerd-fuse-overlayfs.service → /etc/systemd/system/containerd-fuse-overlayfs.service.
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: making mounts shared
INFO: detected cgroup v2
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: faking /sys/class/dmi/id/product_uuid to be random
INFO: faking /sys/devices/virtual/dmi/id/product_uuid as well
INFO: setting iptables to detected mode: legacy
INFO: Detected IPv4 address: 10.0.2.100
INFO: Detected IPv6 address: ::ffff:10.0.2.100

Using docker instead of postman fails

INFO: Detected IPv4 address: 172.17.0.2
INFO: Detected IPv6 address: 
SELinux:  Could not open policy file <= /etc/selinux/targeted/policy/policy.33:  No such file or directory

Hi @jonathanvila

thanks for your detailed report.

I tried to recreate your issue by setting up Fedora 35 (Workstation) + Docker in a VM and running kindcontainer's gradle build there. Unfortunately I have not been able to replicate the error you're seeing - the build worked fine.

The error message in your last post seems to point towards SELinux, but I must admit I am by no means an expert regarding SELinux, so I'm left a bit in the dark there at the moment.

As for podman: I'm only testing with docker and it's very well possible that podman does not support running KinD.

If you can provide more details regarding your OS (especially the SELinux config) and/or your use-case that would help me recreate the issues you're seeing that'd be awesome!

I formated my system to use EXT4 and now it seems to work :)