Cannot re-run 1.12 playbook (eg to add nodes) - kubeadm rbac issue
Opened this issue · 4 comments
/kind bug
What steps did you take and what happened:
Ran upgrade script from 1.11.6 cluster to 1.12.7, masters failed due to temporary api server inavailability, ansible aborted. kubectl get nodes showed that masters were successfully upgraded, tried to re-run script to make sure all plays were performed, script now fails on package install
What did you expect to happen:
Detect no change necessary on masters for stages that were successful, only apply needed changes
Anything else you would like to add:
$ ansible-playbook -i clus.yaml wardroom/swizzle/upgrade.yml
...
TASK [kubernetes-master : add all of the kubernetes add-ons] ********************************************************************************
fatal: [arcadeqa-clus104-master1-c82e66.vm.qis.site.gs.com]: FAILED! => {"changed": true, "cmd": ["kubeadm", "alpha", "phase", "addon", "all", "--config", "/etc/kubernetes/kubeadm.conf"], "delta": "0:00:00.039686", "end": "2019-08-16 12:10:17.833855", "msg": "non-zero return code", "rc": 1, "start": "2019-08-16 12:10:17.794169", "stderr": "Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host", "stderr_lines": ["Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host"], "stdout": "", "stdout_lines": []}
fatal: [arcadeqa-clus104-master2-f69131.vm.qis.site.gs.com]: FAILED! => {"changed": true, "cmd": ["kubeadm", "alpha", "phase", "addon", "all"
, "--config", "/etc/kubernetes/kubeadm.conf"], "delta": "0:00:00.066816", "end": "2019-08-16 12:10:17.972071", "msg": "non-zero return code", "rc": 1, "start": "2019-08-16 12:10:17.905255", "stderr": "Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host", "stderr_lines": ["Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host"], "stdout": "", "stdout_lines": []}
fatal: [arcadeqa-clus104-master3-765238.vm.qis.site.gs.com]: FAILED! => {"changed": true, "cmd": ["kubeadm", "alpha", "phase", "addon", "all", "--config", "/etc/kubernetes/kubeadm.conf"], "delta": "0:00:00.123016", "end": "2019-08-16 12:10:18.019232", "msg": "non-zero return code", "rc": 1, "start": "2019-08-16 12:10:17.896216", "stderr": "Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host", "stderr_lines": ["Get https://api.c104.qis.site.gs.com:6443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp: lookup api.c104.qis.site.gs.com on 127.0.0.53:53: no such host"], "stdout": "", "stdout_lines": []}
$ ansible-playbook -i clus.yaml wardroom/swizzle/upgrade.yml
...
TASK [kubernetes : install kubernetes packages] *********************************************************************************************
FAILED - RETRYING: install kubernetes packages (5 retries left).
FAILED - RETRYING: install kubernetes packages (5 retries left).
FAILED - RETRYING: install kubernetes packages (5 retries left).
FAILED - RETRYING: install kubernetes packages (4 retries left).
FAILED - RETRYING: install kubernetes packages (4 retries left).
FAILED - RETRYING: install kubernetes packages (4 retries left).
FAILED - RETRYING: install kubernetes packages (3 retries left).
FAILED - RETRYING: install kubernetes packages (3 retries left).
FAILED - RETRYING: install kubernetes packages (3 retries left).
FAILED - RETRYING: install kubernetes packages (2 retries left).
FAILED - RETRYING: install kubernetes packages (2 retries left).
FAILED - RETRYING: install kubernetes packages (2 retries left).
FAILED - RETRYING: install kubernetes packages (1 retries left).
FAILED - RETRYING: install kubernetes packages (1 retries left).
FAILED - RETRYING: install kubernetes packages (1 retries left).
[WARNING]: Could not find aptitude. Using apt-get instead
fatal: [arcadeqa-clus104-master2-f69131.vm.qis.site.gs.com]: FAILED! => {"attempts": 5, "cache_update_time": 1565971943, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'kubernetes-cni=0.6.0-00'' failed: E: Packages were downgraded and -y was used without --allow-downgrades.\n", "rc": 100, "stderr": "E: Packages were downgraded and -y was used without --allow-downgrades.\n", "stderr_lines": ["E: Packages were downgraded and -y was used without --allow-downgrades."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following packages were automatically installed and are no longer required:\n conntrack cri-tools grub-pc-bin linux-headers-4.15.0-20\n linux-headers-4.15.0-20-generic linux-image-4.15.0-20-generic\n linux-modules-4.15.0-20-generic\nUse 'sudo apt autoremove' to remove them.\nThe following packages will be REMOVED:\n kubeadm kubelet\nThe following packages will be DOWNGRADED:\n kubernetes-cni\n0 upgraded, 0 newly installed, 1 downgraded, 2 to remove and 171 not upgraded.\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following packages were automatically installed and are no longer required:", " conntrack cri-tools grub-pc-bin linux-headers-4.15.0-20", " linux-headers-4.15.0-20-generic linux-image-4.15.0-20-generic", " linux-modules-4.15.0-20-generic", "Use 'sudo apt autoremove' to remove them.", "The following packages will be REMOVED:", " kubeadm kubelet", "The following packages will be DOWNGRADED:", " kubernetes-cni", "0 upgraded, 0 newly installed, 1 downgraded, 2 to remove and 171 not upgraded."]}
Environment:
- Wardroom version:
branch
1.12 - OS (e.g. from
/etc/os-release
): ubuntu 18.04
Note that this also means nodes can't be added to the cluster - that requires the install playbook to run, must include etcd nodes (so that primary master has that variable set correctly), primary_master (to get kubeadm install token), then the nodes, however the playbook fails when trying to install packages on the master, then can't generate kubeadm token
It looks like the root of my original comment was a kubernetes_cni_version: "0.6.0-00"
variable set in the ansible inventory from a previous version of kubernetes. This seems to be ignored in the upgrade scripts so those had worked and installed cni 0.7.5, but in the install scripts use that variable and it caused failure.
Unfortunately playbook still can't be run - running into this issue when adding a new node: kubernetes/kubeadm#907:
# /usr/bin/kubeadm join api.hostname.com:6443 --token=6uouog.xxxx --discovery-token-unsafe-skip-ca-verification --ignore-preflight-errors=all
...
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
configmaps "kubelet-config-1.12" is forbidden: User "system:bootstrap:6uouog" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
From github this has been due to version mismatch, but here everything was installed/upgraded via wardroom and versions seem to match.
On master:
# apt list --installed | grep kuber
cri-tools/kubernetes-xenial,now 1.12.0-00 amd64 [installed,automatic]
kubeadm/kubernetes-xenial,now 1.12.7-00 amd64 [installed,upgradable to: 1.14.3-00]
kubectl/kubernetes-xenial,now 1.12.7-00 amd64 [installed,upgradable to: 1.14.3-00]
kubelet/kubernetes-xenial,now 1.12.7-00 amd64 [installed,upgradable to: 1.14.3-00]
kubernetes-cni/kubernetes-xenial,now 0.7.5-00 amd64 [installed]
# kubelet --version
Kubernetes v1.12.7
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:49:02Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
On new node:
# apt list --installed | grep kuber
cri-tools/kubernetes-xenial,now 1.12.0-00 amd64 [installed,automatic]
kubeadm/kubernetes-xenial,now 1.12.7-00 amd64 [installed,upgradable to: 1.14.3-00]
kubectl/kubernetes-xenial,now 1.12.7-00 amd64 [installed,upgradable to: 1.14.3-00]
kubelet/kubernetes-xenial,now 1.12.7-00 amd64 [installed,upgradable to: 1.14.3-00]
kubernetes-cni/kubernetes-xenial,now 0.7.5-00 amd64 [installed]
# kubelet --version
Kubernetes v1.12.7
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:49:02Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
What is the state of the scoped token you are trying to use during this run? Are you sure that it has not expired?
I'm using the token generated by wardroom on the master - it fails during wardroom node install, if I do kubeadm token list
immediately on the master it is listed and seems valid, and if I do kubeadm join
on the node manually with it (all within a minute or so of the initial ansible run) it fails with the same error that wardroom got.
Is there a role/rolebinding being misconfigured that's supposed to allow group system:bootstrappers:kubeadm:default-node-token
to access those configmaps?
On the master:
root@arcadebackup-clus8-master1-9c250a:~# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
kghrxz.jtfd32hksthjp7m9 23h 2019-09-04T14:57:01-04:00 authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
On the node:
$ /usr/bin/kubeadm join api.c8....:6443 --token=kghrxz.jtfd32hksthjp7m9 --discovery-token-unsafe-skip-ca-verification --ignore-preflight-errors=all
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.2. Latest validated version: 18.06
[discovery] Trying to connect to API Server "api.c8....:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://api.c8....:6443"
[discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server "api.c8....:6443"
[discovery] Successfully established connection with API Server "api.c8....:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
unable to fetch the kubeadm-config ConfigMap: failed to get config map: configmaps "kubeadm-config" is forbidden: User "system:bootstrap:kghrxz" cannot get resource "configmaps" in API group "" in the namespace "kube-system"