chris-short/rak8s

Fresh install failed on /proc/sys/net/bridge/bridge-nf-call-iptables issue and missing cgroups memory

tedsluis opened this issue · 1 comments

OS running on Ansible host:

pi@ansible-host ~/git/rak8s $  uname -a
Linux ansible-host-5 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 armv7l GNU/Linux

Ansible Version (ansible --version):

pi@ansible-host ~/git/rak8s $ ansible --version
ansible 2.2.0.0
  config file = /home/pi/git/rak8s/ansible.cfg
  configured module search path = Default w/o overrides

Uploaded logs showing errors(rak8s/.log/ansible.log):

2 runs:

  • First failed on TASK [common : Pass bridged IPv4 traffic to iptables' chains].
  • Second failed on TASK [master : Initialize Master].
pi@ansible-host ~/git/rak8s $ ansible-playbook cluster.yml 

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [node1]
ok: [node2]
ok: [master]

TASK [common : Enabling cgroup options at boot] ********************************
changed: [node1]
changed: [master]
changed: [node2]

TASK [common : Pass bridged IPv4 traffic to iptables' chains] ******************
fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory\n"}
fatal: [master]: FAILED! => {"changed": false, "failed": true, "msg": "Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory\n"}
fatal: [node2]: FAILED! => {"changed": false, "failed": true, "msg": "Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory\n"}

PLAY RECAP *********************************************************************
master                     : ok=2    changed=1    unreachable=0    failed=1   
node1                      : ok=2    changed=1    unreachable=0    failed=1   
node2                      : ok=2    changed=1    unreachable=0    failed=1   


pi@ansible-host ~/git/rak8s $ ansible-playbook cluster.yml 

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [node1]
ok: [node2]
ok: [master]

TASK [common : Enabling cgroup options at boot] ********************************
ok: [node2]
ok: [node1]
ok: [master]

TASK [common : Pass bridged IPv4 traffic to iptables' chains] ******************
ok: [node1]
ok: [master]
ok: [node2]

TASK [common : apt-get update] *************************************************
ok: [node2]
ok: [node1]
ok: [master]

TASK [common : apt-get upgrade] ************************************************
ok: [node2]
ok: [node1]
ok: [master]

TASK [common : Reboot] *********************************************************
skipping: [master]
skipping: [node1]
skipping: [node2]

TASK [common : Wait for Reboot] ************************************************
skipping: [master]
skipping: [node1]
skipping: [node2]

TASK [kubeadm : Disable Swap] **************************************************
changed: [master]
changed: [node1]
changed: [node2]

TASK [kubeadm : Determine if docker is installed] ******************************
ok: [master]
ok: [node2]
ok: [node1]

TASK [kubeadm : Run Docker Install Script] *************************************
changed: [node2]
changed: [node1]
changed: [master]

TASK [kubeadm : Install apt-transport-https] ***********************************
ok: [master]
ok: [node1]
ok: [node2]

TASK [kubeadm : Add Google Cloud Repo Key] *************************************
changed: [master]
 [WARNING]: Consider using get_url or uri module rather than running curl

changed: [node2]
changed: [node1]

TASK [kubeadm : Add Kubernetes to Available apt Sources] ***********************
changed: [node1]
changed: [master]
changed: [node2]

TASK [kubeadm : apt-get update] ************************************************
changed: [node2]
changed: [node1]
changed: [master]

TASK [kubeadm : Install k8s Y'all] *********************************************
changed: [node2] => (item=[u'kubelet', u'kubeadm', u'kubectl'])
changed: [node1] => (item=[u'kubelet', u'kubeadm', u'kubectl'])
changed: [master] => (item=[u'kubelet', u'kubeadm', u'kubectl'])

PLAY [master] ******************************************************************

TASK [master : Reset Kubernetes Master] ****************************************
changed: [master]

TASK [master : Initialize Master] **********************************************
fatal: [master]: FAILED! => {"changed": true, "cmd": "kubeadm init --apiserver-advertise-address=192.168.11.210 --token=udy29x.ugyyk3tumg27atmr", "delta": "0:00:02.406248", "end": "2018-05-11 20:32:27.185350", "failed": true, "rc": 2, "start": "2018-05-11 20:32:24.779102", "stderr": "\t
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03\n\t
[WARNING FileExisting-crictl]: crictl not found in system path\nSuggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl\n[preflight] Some fatal errors occurred:\n\t
[ERROR SystemVerification]: missing cgroups: memory\n[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`", "stdout": "
[init] Using Kubernetes version: v1.10.2\n[init] Using Authorization modes: [Node RBAC]\n[preflight] Running pre-flight checks.\n[preflight] 
The system verification failed. Printing the output from the verification:\n\u001b[0;37mKERNEL_VERSION\u001b[0m: \u001b[0;32m4.14.34-v7+\u001b[0m\n\u001b[0;37mCONFIG_NAMESPACES\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_NET_NS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_PID_NS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_IPC_NS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_UTS_NS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_CGROUPS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_CGROUP_CPUACCT\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_CGROUP_DEVICE\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_CGROUP_FREEZER\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_CGROUP_SCHED\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_CPUSETS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_MEMCG\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_INET\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_EXT4_FS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_PROC_FS\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCONFIG_NETFILTER_XT_TARGET_REDIRECT\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m\n\u001b[0;37mCONFIG_NETFILTER_XT_MATCH_COMMENT\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m\n\u001b[0;37mCONFIG_OVERLAY_FS\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m\n\u001b[0;37mCONFIG_AUFS_FS\u001b[0m: \u001b[0;33mnot set - Required for aufs.\u001b[0m\n\u001b[0;37mCONFIG_BLK_DEV_DM\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m\n\u001b[0;37mDOCKER_VERSION\u001b[0m: \u001b[0;32m18.05.0-ce\u001b[0m\n\u001b[0;37mOS\u001b[0m: \u001b[0;32mLinux\u001b[0m\n\u001b[0;37mCGROUPS_CPU\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCGROUPS_CPUACCT\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCGROUPS_CPUSET\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCGROUPS_DEVICES\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCGROUPS_FREEZER\u001b[0m: \u001b[0;32menabled\u001b[0m\n\u001b[0;37mCGROUPS_MEMORY\u001b[0m: \u001b[0;31mmissing\u001b[0m", "stdout_lines": ["
[init] Using Kubernetes version: v1.10.2", "[init] Using Authorization modes: [Node RBAC]", "[preflight] Running pre-flight checks.", "
[preflight] The system verification failed. Printing the output from the verification:", "\u001b[0;37mKERNEL_VERSION\u001b[0m: \u001b[0;32m4.14.34-v7+\u001b[0m", "\u001b[0;37mCONFIG_NAMESPACES\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_NET_NS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_PID_NS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_IPC_NS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_UTS_NS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_CGROUPS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_CGROUP_CPUACCT\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_CGROUP_DEVICE\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_CGROUP_FREEZER\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_CGROUP_SCHED\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_CPUSETS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_MEMCG\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_INET\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_EXT4_FS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_PROC_FS\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCONFIG_NETFILTER_XT_TARGET_REDIRECT\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m", "\u001b[0;37mCONFIG_NETFILTER_XT_MATCH_COMMENT\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m", "\u001b[0;37mCONFIG_OVERLAY_FS\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m", "\u001b[0;37mCONFIG_AUFS_FS\u001b[0m: \u001b[0;33mnot set - Required for aufs.\u001b[0m", "\u001b[0;37mCONFIG_BLK_DEV_DM\u001b[0m: \u001b[0;32menabled (as module)\u001b[0m", "\u001b[0;37mDOCKER_VERSION\u001b[0m: \u001b[0;32m18.05.0-ce\u001b[0m", "\u001b[0;37mOS\u001b[0m: \u001b[0;32mLinux\u001b[0m", "\u001b[0;37mCGROUPS_CPU\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCGROUPS_CPUACCT\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCGROUPS_CPUSET\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCGROUPS_DEVICES\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCGROUPS_FREEZER\u001b[0m: \u001b[0;32menabled\u001b[0m", "\u001b[0;37mCGROUPS_MEMORY\u001b[0m: \u001b[0;31mmissing\u001b[0m"], "warnings": []}

PLAY RECAP *********************************************************************
master                     : ok=14   changed=7    unreachable=0    failed=1   
node1                      : ok=13   changed=6    unreachable=0    failed=0   
node2                      : ok=13   changed=6    unreachable=0    failed=0 

Raspberry Pi Hardware Version:

3B+

Raspberry Pi OS & Version (cat /etc/os-release):

$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

$ uname -a
Linux master 4.14.34-v7+ #1110 SMP Mon Apr 16 15:18:51 BST 2018 armv7l GNU/Linux

Detailed description of the 3 issues:

I started this ansible-playbook cluster.yml install on a set of fresh raspberry pi’s (new raspbian lite image, release data 2018-04-18). I did run into the following issues:

    1. The first attempt I run into this error: Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory.
    1. The second attempt I run into this error: SystemVerification: missing cgroups: memory.
    1. Together with the missing cgroups: memory error, I got this warning: SystemVerification: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03.

Although you can work around these issues (via re-runs and reboots), it would be nice to fix these issues for user experience of new users.

1) Detailed description of the first issue:

Failed to reload sysctl: sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory.
After this error the playbook stops, because the key /proc/sys/net/bridge/bridge-nf-call-iptables didn’t exists. When you run the playbook for the second time, this error won’t occur, because the key exists by then. Not a real error and I think it can be fixed by added ignoreerrors: yes to the sysctl command in the playbook. I will test it and then provide a pull request for this.

This issue was already reported in #13, closed without a solution.

2) Detailed description of the second issue:

*SystemVerification: missing cgroups: memory. *
From the Ansible log I can see that the Task Enabling cgroup options at boot was performed well on all the raspberry pi’s, as you can see below:

$ ls -l /boot/cmdline.txt
-rwxr-xr-x 1 root root 194 May 11 17:51 /boot/cmdline.txt
$ cat /boot/cmdline.txt
dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

I also see from the log that the reboot hasn’t taken place, as you can see below:

$ uptime
 21:38:16 up  12:14,  1 user,  load average: 0.03, 0.05, 0.07

This issue was already reported in #12, but closed without a solution.

I haven’t figured out why the reboots where skipped. The /boot/cmdline.txt file was modified by the playbook, but that didn’t triggered the reboot. When I modified the file afterwards and re-run the playbook, it did triggered the reboot?! I have experienced the issue every time I start with fresh raspbian images. Anyone?

3) Detailed description of the warning:

The playbook installed Docker version 18.05.0-ce. Later the playbook warns at the Task Initialize Master (kubeadm init) about the fact that kubeadm doesn’t support a Docker version higher than 17.03. Not an error, but this could cause issues as I have seen before in production environments.

Docker version

$ sudo docker version
Client:
 Version:      18.05.0-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   f150324
 Built:        Wed May  9 22:24:36 2018
 OS/Arch:      linux/arm
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   f150324
  Built:        Wed May  9 22:20:37 2018
  OS/Arch:      linux/arm
  Experimental: false

In the latest kubeadm documentation ( https://kubernetes.io/docs/setup/independent/install-kubeadm/ ) I found this comment about the Docker version:

Version v1.12 is recommended, but v1.11, v1.13 and 17.03 are known to work as well. 
Versions 17.06+ might work, but have not yet been tested and verified by the Kubernetes node team.

So Docker latest can’t be used any more. We should install Docker 17.03. Currently the Docker install script used in the playbook Task Run Docker Install Script doesn’t support a specific Docker version, see https://docs.docker.com/install/linux/docker-ce/debian/#install-using-the-convenience-script

The script does not provide options to specify which version of `Docker` to install, 
and installs the latest version that is released in the `edge` channel.

Another issue is that Docker 17.03 has become deprecated: https://docs.docker.com/release-notes/docker-ce/#17032-ce-2017-05-29
I changed the edge channel to stable channel within /etc/apt/sources.list.d/docker.list and I noticed that Docker 17.03 is no longer available:

$ sudo sed -i ‘s/edge/stable’’ /etc/apt/sources.list.d/docker.list
$ sudo apt-get update
Hit:1 http://archive.raspberrypi.org/debian stretch InRelease
Hit:2 http://raspbian.raspberrypi.org/raspbian stretch InRelease          
Hit:4 https://download.docker.com/linux/raspbian stretch InRelease             
Hit:3 https://packages.cloud.google.com/apt kubernetes-xenial InRelease
Get:5 https://download.docker.com/linux/raspbian stretch/stable armhf Packages [2,507 B]
Fetched 2,507 B in 2s (847 B/s)       
Reading package lists... Done
$ sudo apt-cache madison docker-ce 
 docker-ce | 18.03.1~ce-0~raspbian | https://download.docker.com/linux/raspbian stretch/stable armhf Packages
 docker-ce | 18.03.0~ce-0~raspbian | https://download.docker.com/linux/raspbian stretch/stable armhf Packages
 docker-ce | 17.12.1~ce-0~raspbian | https://download.docker.com/linux/raspbian stretch/stable armhf Packages
 docker-ce | 17.12.0~ce-0~raspbian | https://download.docker.com/linux/raspbian stretch/stable armhf Packages
 docker-ce | 17.09.1~ce-0~raspbian | https://download.docker.com/linux/raspbian stretch/stable armhf Packages
 docker-ce | 17.09.0~ce-0~raspbian | https://download.docker.com/linux/raspbian stretch/stable armhf Packages

Unfortunately, unlike debian or other Linux distros, raspbian doesn’t has isn’t own Docker 17.03 package.

For many reason it would be wise to not deviate from the by Kubernetes requested Docker version. Docker 18.05 is an so called edge release. For stability and compatibility it may be wise to switch over to stable Docker releases, see https://docs.docker.com/release-notes/docker-ce

To install a stable Docker version you must uninstall Docker as following:

$ sudo apt-get remove --auto-remove docker
$ sudo rm -rf /var/lib/docker

Switch to stable:

$ sudo sed -i ‘s/edge/stable’’ /etc/apt/sources.list.d/docker.list
$ sudo apt-get update

Choose a version:

$ sudo apt-cache madison docker-ce

Install your version:

$ sudo apt-get install docker-ce=<your version>

Would it be wise to provide fixed versions for Docker and Kubernetes?

Fixed by #32