chris-short/rak8s

Nodes not ready on raspberrypi

Closed this issue · 13 comments

OS running on Ansible host:

Linux Mint 18

Ansible Version (ansible --version):

2.5.1

Uploaded logs showing errors(rak8s/.log/ansible.log)

n/a

Raspberry Pi Hardware Version:

Raspi 3 B

Raspberry Pi OS & Version (cat /etc/os-release):

PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian

Detailed description of the issue:

I've set up 3 raspis with the 2018-03-13-raspbian-stretch-lite.img
and the ansible scripts with tag 0.1.5 of this repo.
After a few reboots the kubectl works but
"sudo kubectl get nodes"
reports the master node and a worker node notReady.
on the master node "kubectl describe node ..." reports the following

runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized.

After that I directly on the master node tried "sudo kubeadm init" (just in order to check what happens)
and I get

WARNING: [init] Using Kubernetes version: v1.10.1
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.04.0-ce. Max validated version: 17.03
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-10251]: Port 10251 is in use
[ERROR Port-10252]: Port 10252 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-2379]: Port 2379 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
CPU hardcapping unsupported

By the way: During the setup with the ansible scripts I also got the Bug #26
I then executed the regarding command directly on my master node and that succeeded.
After that I was able to rerun the playbook.
I am running the playbook from a laptop "outside" the raspis.

Try the latest release and let me know how that goes, please: https://github.com/rak8s/rak8s/releases/tag/v0.2.0

Tried to install with two freshly installed 2018-03-13-raspbian-stretch-lite.img on my raspis named raspic0 and raspic1. I now use nsible 2.5.2 on Linux Mint 18.2

"ansible-playbook cluster.yml" results in


/ TASK [common : Pass bridged IPv4 traffic to iptables'
\ chains] /

    \   ^__^
     \  (oo)\_______
        (__)\       )\/\
            ||----w |
            ||     ||

fatal: [raspic1]: FAILED! => {"changed": false, "msg": "Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory\n"}
fatal: [raspic0]: FAILED! => {"changed": false, "msg": "Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory\n"}

After that I stopped the ansible-playbook with CTRL+C
and re ran it again.

The Task [common: Pass bridged IPv4 traffic to iptables chains] succeeded after that.
But it failed at the Run Docker Install Script with


< TASK [kubeadm : Run Docker Install Script] >

    \   ^__^
     \  (oo)\_______
        (__)\       )\/\
            ||----w |
            ||     ||

fatal: [raspic1]: FAILED! => {"changed": true, "msg": "non-zero return code", "rc": 100, "stderr": "Shared connection to raspic1 closed.\r\n", "stdout": "# Executing docker install script, commit: 1d31602\r\n+ sh -c apt-get update -qq >/dev/null\r\n+ sh -c apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null\r\n+ sh -c curl -fsSL "https://download.docker.com/linux/raspbian/gpg\" | apt-key add -qq - >/dev/null\r\nWarning: apt-key output should not be parsed (stdout is not a terminal)\r\n+ sh -c echo "deb [arch=armhf] https://download.docker.com/linux/raspbian stretch edge" > /etc/apt/sources.list.d/docker.list\r\n+ [ raspbian = debian ]\r\n+ sh -c apt-get update -qq >/dev/null\r\n+ sh -c apt-get install -y -qq --no-install-recommends docker-ce >/dev/null\r\nE: Sub-process /usr/bin/dpkg returned an error code (1)\r\n", "stdout_lines": ["# Executing docker install script, commit: 1d31602", "+ sh -c apt-get update -qq >/dev/null", "+ sh -c apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null", "+ sh -c curl -fsSL "https://download.docker.com/linux/raspbian/gpg" | apt-key add -qq - >/dev/null", "Warning: apt-key output should not be parsed (stdout is not a terminal)", "+ sh -c echo "deb [arch=armhf] https://download.docker.com/linux/raspbian stretch edge" > /etc/apt/sources.list.d/docker.list", "+ [ raspbian = debian ]", "+ sh -c apt-get update -qq >/dev/null", "+ sh -c apt-get install -y -qq --no-install-recommends docker-ce >/dev/null", "E: Sub-process /usr/bin/dpkg returned an error code (1)"]}
fatal: [raspic0]: FAILED! => {"changed": true, "msg": "non-zero return code", "rc": 100, "stderr": "Shared connection to raspic0 closed.\r\n", "stdout": "# Executing docker install script, commit: 1d31602\r\n+ sh -c apt-get update -qq >/dev/null\r\n+ sh -c apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null\r\n+ sh -c curl -fsSL "https://download.docker.com/linux/raspbian/gpg\" | apt-key add -qq - >/dev/null\r\nWarning: apt-key output should not be parsed (stdout is not a terminal)\r\n+ sh -c echo "deb [arch=armhf] https://download.docker.com/linux/raspbian stretch edge" > /etc/apt/sources.list.d/docker.list\r\n+ [ raspbian = debian ]\r\n+ sh -c apt-get update -qq >/dev/null\r\n+ sh -c apt-get install -y -qq --no-install-recommends docker-ce >/dev/null\r\nE: Sub-process /usr/bin/dpkg returned an error code (1)\r\n", "stdout_lines": ["# Executing docker install script, commit: 1d31602", "+ sh -c apt-get update -qq >/dev/null", "+ sh -c apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null", "+ sh -c curl -fsSL "https://download.docker.com/linux/raspbian/gpg" | apt-key add -qq - >/dev/null", "Warning: apt-key output should not be parsed (stdout is not a terminal)", "+ sh -c echo "deb [arch=armhf] https://download.docker.com/linux/raspbian stretch edge" > /etc/apt/sources.list.d/docker.list", "+ [ raspbian = debian ]", "+ sh -c apt-get update -qq >/dev/null", "+ sh -c apt-get install -y -qq --no-install-recommends docker-ce >/dev/null", "E: Sub-process /usr/bin/dpkg returned an error code (1)"]}

After that I rebooted raspic0 and raspic1 manually.
After that a new "ansible-playbook cluster.yml" succeeded.

But the nodes are still NotReady.
If I log into the master (which is my raspic0) "sudo kubectl get nodes" responds with
NAME STATUS ROLES AGE VERSION
raspic0 NotReady master 6m v1.10.2
raspic1 NotReady 5m v1.10.2

additionally the master still shows the following message if queried with "sudo kubectl describe raspic0":

Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


OutOfDisk False Tue, 01 May 2018 21:59:06 +0000 Tue, 01 May 2018 21:52:12 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 01 May 2018 21:59:06 +0000 Tue, 01 May 2018 21:52:12 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 01 May 2018 21:59:06 +0000 Tue, 01 May 2018 21:52:12 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 01 May 2018 21:59:06 +0000 Tue, 01 May 2018 21:52:12 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Tue, 01 May 2018 21:59:06 +0000 Tue, 01 May 2018 21:52:12 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized. WARNING: CPU hardcapping unsupported

That means that the latest master version didn't fix that problem.

Quote Chris-short: One of the things we definitely need to work on is reliability and
consistency. Sadly, I don't have a cluster to test with and the cluster I
created with rak8s is actively in use doing things here for me. Pretty sure
I'd be accosted for buying a three node cluster for testing.

Yes, testing is definitively important. I am looking into the Kubernetes end to end test:

I will try to set it up. I can run tests on a 3 node cluster.

I Chris-short, hi tedsluis: I used the last version of the development branch and now all the nodes are running.
But if I try to deploy a pod/service then it doesn't resond.

eg the commands:
$ sudo kubectl run hypriot --image=hypriot/rpi-busybox-httpd --replicas=3 --port=80
$ sudo kubectl get endpoints hypriot

results in the following:
$ sudo kubectl get endpoints hypriot
NAME ENDPOINTS AGE
hypriot 172.30.1.6:80,172.30.2.4:80,172.30.3.5:80 11m

but "curl 172.30.1.6:80" doesn't give a result.

On the other hand checking the services results in
$ sudo kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hypriot ClusterIP 10.99.49.200 80/TCP 13m

That means that the cluster IP is in a completely different network than the endpoints.
Is that a normal behavior?
What should I check in order to find out why the container doesnt respond to the curl?

I'm going to close this. Please grab the latest version and try again. If there are bugs, please submit them.

i had the same issue where nodes always on "Not Ready" and i am on latest 1.12.2. any help