kubernetes-retired/kube-aws

"Unit kubelet.service is not loaded properly: Exec format error." on kube-aws 0.14.0-rc.2

cw-sakamoto opened this issue · 1 comments

When I used v0.14.0-rc.2 to build a cluster, I got an error in cloudformation.
When I execute journalctl on the host and check the log, I get the following error.

Jun 27 04:53:46 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: /run/systemd/system/kubelet.service:20: Unbalanced quoting, ignoring: "/bin/sh -c "exec /usr/lib/coreos/kubelet-wrapper  --cni-conf-dir=/etc/kubernetes/cni/net.d >
Jun 27 04:53:46 ip-10-0-36-219.ap-northeast-1.compute.internal bash[1079]: 2019/06/27 04:53:46 Failed to apply cloud-config: Unit kubelet.service is not loaded properly: Exec format error.
Jun 27 04:53:46 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: coreos-cloudinit-817810032.service: Main process exited, code=exited, status=1/FAILURE
Jun 27 04:53:46 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: coreos-cloudinit-817810032.service: Failed with result 'exit-code'.
Jun 27 05:04:48 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: Created slice system-sshd.slice.
Jun 27 05:04:48 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: Started OpenSSH per-connection server daemon (150.249.210.146:57086).
Jun 27 05:04:48 ip-10-0-36-219.ap-northeast-1.compute.internal sshd[1431]: /etc/ssh/sshd_config line 1: Deprecated option UsePrivilegeSeparation
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal sshd[1431]: Accepted publickey for core from 150.249.210.146 port 57086 ssh2: RSA SHA256:aWQG8rOOwrjxInq+2JlYT17l20D7wQKQC14jiViR2Nk
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal sshd[1431]: pam_unix(sshd:session): session opened for user core by (uid=0)
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: Created slice User Slice of core.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: Starting User Manager for UID 500...
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd-logind[902]: New session 1 of user core.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: Started Session 1 of user core.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1434]: pam_unix(systemd-user:session): session opened for user core by (uid=0)
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1434]: Reached target Sockets.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1434]: Reached target Timers.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1434]: Reached target Paths.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1434]: Reached target Basic System.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1434]: Reached target Default.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1434]: Startup finished in 16ms.
Jun 27 05:04:50 ip-10-0-36-219.ap-northeast-1.compute.internal systemd[1]: Started User Manager for UID 500.

When I exported and checked userdata, a blank line was included in kubelet.service.

    - name: kubelet.service
      command: start
      runtime: true
      content: |
        [Unit]
        Wants=rpc-statd.service        
        Wants=decrypt-assets.service
        After=decrypt-assets.service
        
        [Service]
        EnvironmentFile=/etc/environment
        EnvironmentFile=-/etc/default/kubelet
        Environment=KUBELET_IMAGE_TAG=v1.14.3
        Environment=KUBELET_IMAGE_URL=docker://k8s.gcr.io/hyperkube-amd64
        Environment="RKT_RUN_ARGS=--insecure-options=image \
        --volume dns,kind=host,source=/etc/resolv.conf \
        --mount volume=dns,target=/etc/resolv.conf \
        --volume var-lib-cni,kind=host,source=/var/lib/cni \
        --mount volume=var-lib-cni,target=/var/lib/cni \
        --volume var-run-calico,kind=host,source=/var/run/calico \
        --mount volume=var-run-calico,target=/var/run/calico \
        --volume var-lib-calico,kind=host,source=/var/lib/calico \
        --mount volume=var-lib-calico,target=/var/lib/calico \
        --volume var-log,kind=host,source=/var/log \
        --mount volume=var-log,target=/var/log \
        --volume cni-bin,kind=host,source=/opt/cni/bin \
        --mount volume=cni-bin,target=/opt/cni/bin"
        ExecStartPre=/usr/bin/mkdir -p /var/lib/cni
        ExecStartPre=/usr/bin/mkdir -p /var/log/containers
        ExecStartPre=/usr/bin/mkdir -p /opt/cni/bin
        ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
        ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/cni/net.d
        ExecStartPre=/usr/bin/mkdir -p /var/run/calico
        ExecStartPre=/usr/bin/mkdir -p /var/lib/calico
        ExecStartPre=/bin/sed -e "s/COREOS_PRIVATE_IPV4/${COREOS_PRIVATE_IPV4}/g" -i /etc/kubernetes/config/kubelet.yaml
        ExecStart=/bin/sh -c "exec /usr/lib/coreos/kubelet-wrapper \
        --cni-conf-dir=/etc/kubernetes/cni/net.d \
        --cni-bin-dir=/opt/cni/bin \
        --network-plugin=cni \
        --container-runtime=docker \
        --node-labels=node.kubernetes.io/role="node",node.kubernetes.io/role="spot-worker",kubernetes.io/role=node,node-role.kubernetes.io/node=\"\",node-role.kubernetes.io/spot-worker=\"\",kube-aws.coreos.com/role=app-worker \
        --register-node=true \
        --config=/etc/kubernetes/config/kubelet.yaml \
        
        --cloud-provider=aws \
        --cert-dir=/etc/kubernetes/ssl \
        --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig/worker-bootstrap.yaml \
        --kubeconfig=/etc/kubernetes/kubeconfig/kubelet.yaml \
        $KUBELET_OPTS"
        Restart=always
        RestartSec=10
        [Install]
        WantedBy=multi-user.target

In https://github.com/kubernetes-incubator/kube-aws/blob/v0.14.0-rc.2/builtin/files/userdata/cloud-config-worker#L356-L358, I feel that the gotemplate writing method is wrong, what do you think?

I think:

        {{if .Taints -}}
        --register-with-taints={{.Taints.String}} \
        {{ end -}}

Thanks for catching this, I've opened PRs to fix the issue in v0.13.x and v0.14.x releases.