projectcalico/canal

net-conf.json not mounted into node

felskrone opened this issue · 10 comments

I setup my cluster using canal and all busybox-pods and nodes can ping each other. But i had to workaround a problem i do not fully understand.

The canal.yaml contains privileged containers that come with several binaries and configs. For example the container

...
        containers:
          - name: calico-node
            image: quay.io/calico/node:v2.5.1
...

contains the 'calico' binary. This gets installed to '/opt/cni/bin/calico' and since the container is privileged, the calico-binary shows up on the host itself as '/opt/cni/bin/calico'.

The same applies to configs of the calico-node-container which installs configs to '/etc/cni/net.d/' which also show up on the node itself.

worker03:/opt/cni/bin# ls -l /etc/cni/net.d/
total 12
-rw-rw-r-- 1 root root 1611 2017-11-07 15:35 10-calico.conflist
-rw-r--r-- 1 root root  273 2017-11-07 15:35 calico-kubeconfig

Shouldnt that happen with the 'net-conf.json' config from the flannel-container as well? From my understanding it should show up on the node in '/etc/kube-flannel':

         - name: kube-flannel
            image: quay.io/coreos/flannel:v0.8.0
            command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
...
            volumeMounts:
            - name: run
              mountPath: /run
            - name: flannel-cfg
              mountPath: /etc/kube-flannel/

On my nodes that is not the case and my pods dont get an IP assigned. See below.

Expected Behavior

Canal assigns IPs to pods and the pods come up with networking fully functioning.

Current Behavior

With the default config for flannel from canal.yaml

      {
        "Network": "10.244.0.0/16",
        "Backend": {
          "Type": "vxlan"
        }
      }

my pods stay in the 'ContainerCreating' state and the kubelet-log states 'cni config unitialized'. The config is available in the kube-flannel container when i exec into it

/ # master01:/srv/salt/formulas/k8s-formula/kubernetes/iaac/canal# kubectl exec -ti canal-kq9vc /bin/sh -n kube-system -c kube-flannel

/ # cd /etc/kube-flannel/
/etc/kube-flannel # ls -l
total 0
lrwxrwxrwx    1 root     root            18 Nov  7 15:57 canal_iface -> ..data/canal_iface
lrwxrwxrwx    1 root     root            25 Nov  7 15:57 cni_network_config -> ..data/cni_network_config
lrwxrwxrwx    1 root     root            17 Nov  7 15:57 masquerade -> ..data/masquerade
lrwxrwxrwx    1 root     root            20 Nov  7 15:57 net-conf.json -> ..data/net-conf.json
/etc/kube-flannel # cat net-conf.json
{
  "Network": "10.244.0.0/16",
  "Backend": {
    "Type": "vxlan"
  }
}

But the kubelet log says 'cni config uninitialized'.

16:31 worker03 kubelet[897]: E1107 17:16:31.443948     897 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "busybox-w3_default(306aae5c-c3d7-11e7-9d9a-0022195f6b5b)" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w3_default_306aae5c-c3d7-11e7-9d9a-0022195f6b5b_0 in sandbox 2c4d4abf9049c94a95da3e6e129b015f420754185c31fafe0fe60144e8dc6d3e: cni config uninitialized
Nov 07 17:16:31 worker03 kubelet[897]: E1107 17:16:31.443975     897 kuberuntime_manager.go:618] createPodSandbox for pod "busybox-w3_default(306aae5c-c3d7-11e7-9d9a-0022195f6b5b)" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w3_default_306aae5c-c3d7-11e7-9d9a-0022195f6b5b_0 in sandbox 2c4d4abf9049c94a95da3e6e129b015f420754185c31fafe0fe60144e8dc6d3e: cni config uninitialized
Nov 07 17:16:31 worker03 kubelet[897]: E1107 17:16:31.444031     897 pod_workers.go:182] Error syncing pod 306aae5c-c3d7-11e7-9d9a-0022195f6b5b ("busybox-w3_default(306aae5c-c3d7-11e7-9d9a-0022195f6b5b)"), skipping: failed to "CreatePodSandbox" for "busybox-w3_default(306aae5c-c3d7-11e7-9d9a-0022195f6b5b)" with CreatePodSandboxError: "CreatePodSandbox for pod \"busybox-w3_default(306aae5c-c3d7-11e7-9d9a-0022195f6b5b)\" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w3_default_306aae5c-c3d7-11e7-9d9a-0022195f6b5b_0 in sandbox 2c4d4abf9049c94a95da3e6e129b015f420754185c31fafe0fe60144e8dc6d3e: cni config uninitialized"

Workaround

If i place '/etc/cni/net.d/10-flannel.conf' with the following content onto the node

{
  "name": "cbr0",
  "type": "flannel",
  "delegate": {
    "isDefaultGateway": true
  }
}

on the next kubelet-run the pods in state 'ContainerCreating' get created and an ip assigned. I dont have to reboot or restart anything.

I have retried this several times on different nodes.

Does that make any sense?

I dont want to mix node-configs with ConfigMaps that way and also hesitate to edit the standard canal.yaml.

What could be the cause of this behaviour?

What configs do you need to be able to narrow this down?

Your Environment

  • Calico version: quay.io/calico/node:v2.5.1
  • Flannel version: quay.io/coreos/flannel:v0.8.0
  • Orchestrator version: Kubernetes 1.7.6 with RBAC
  • Operating System and version: Debian Stretch

@felskrone only a single CNI network config is used, in this case that is 10-calico.conflist.

What version of Kubernetes are you using? If it says the config is uninitialized, then likely it isn't respecting the .conflist suffix, which is support in k8s v1.7+ I think.

I think the workaround of installing the 10-flannel.conf file is not correct, as that will bypass Calico and so policy will not be enforced.

To fully understand this (sorry, this might be beginner questions...):

By putting the custom 10-flannel.conf onto the node im bypassing calico completely? That means my setup is solely using the flannel/vxlan for networking even tough i applied the canal.yaml?

Is that why all my nodes are routing through the flannel.1 device?

worker04:~# ip r s
default via 10.20.8.1 dev eno2 onlink
10.20.8.0/25 dev eno2 proto kernel scope link src 10.20.8.41
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.0.0/16 dev flannel.1
10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink
10.244.10.0/24 via 10.244.10.0 dev flannel.1 onlink
10.244.11.0/24 dev cni0 proto kernel scope link src 10.244.11.1

Regarding the *.conflist support i just tried symlinking '10-calico.conflist' to '10-calico.conf' and get a new error message 'no plugin name provided', here are the log-entries.

Nov 07 20:10:11 worker03 kubelet[3925]: E1107 20:10:11.604519    3925 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w3_default_88d62a8d-c3ef-11e7-9d9a-0022195f6b5b_0 in sandbox 7a1ad45ffbe861e17836f2331240d180bbaf26019833a85c6006920fd14d8b6d: no plugin name provided
Nov 07 20:10:11 worker03 kubelet[3925]: E1107 20:10:11.604614    3925 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "busybox-w3_default(88d62a8d-c3ef-11e7-9d9a-0022195f6b5b)" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w3_default_88d62a8d-c3ef-11e7-9d9a-0022195f6b5b_0 in sandbox 7a1ad45ffbe861e17836f2331240d180bbaf26019833a85c6006920fd14d8b6d: no plugin name provided
Nov 07 20:10:11 worker03 kubelet[3925]: E1107 20:10:11.604641    3925 kuberuntime_manager.go:618] createPodSandbox for pod "busybox-w3_default(88d62a8d-c3ef-11e7-9d9a-0022195f6b5b)" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w3_default_88d62a8d-c3ef-11e7-9d9a-0022195f6b5b_0 in sandbox 7a1ad45ffbe861e17836f2331240d180bbaf26019833a85c6006920fd14d8b6d: no plugin name provided
Nov 07 20:10:11 worker03 kubelet[3925]: E1107 20:10:11.604794    3925 pod_workers.go:182] Error syncing pod 88d62a8d-c3ef-11e7-9d9a-0022195f6b5b ("busybox-w3_default(88d62a8d-c3ef-11e7-9d9a-0022195f6b5b)"), skipping: failed to "CreatePodSandbox" for "busybox-w3_default(88d62a8d-c3ef-11e7-9d9a-0022195f6b5b)" with CreatePodSandboxError: "CreatePodSandbox for pod \"busybox-w3_default(88d62a8d-c3ef-11e7-9d9a-0022195f6b5b)\" failed: rpc error: code = 2 desc = failed to create network for container k8s_infra_busybox-w3_default_88d62a8d-c3ef-11e7-9d9a-0022195f6b5b_0 in sandbox 7a1ad45ffbe861e17836f2331240d180bbaf26019833a85c6006920fd14d8b6d: no plugin name provided"

So im guessing '_10-calico.conf' is getting picked up but is not working properly for me?

Where do i go from here?

PS: Some more info regarding the kubelet configuration.

DAEMON_ARGS="\
--hostname-override worker03 \
--container-runtime=remote \
--container-runtime-endpoint=unix:///var/run/crio.sock \
--image-service-endpoint=unix:///var/run/crio.sock \
--enable-custom-metrics \
--image-pull-progress-deadline=2m \
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
--cluster-dns=10.20.8.90 \
--require-kubeconfig \
--register-node=true \
--runtime-request-timeout=10m \
--pod-cidr=10.244.0.0/16 \
--allow-privileged=True \
--cni-bin-dir=/opt/cni/bin \
--cni-conf-dir=/etc/cni/net.d \
--network-plugin=cni \
--v=2 \
--file-check-frequency=5s \
--tls-cert-file=/etc/kubernetes/ssl/worker03.pem \
--tls-private-key-file=/etc/kubernetes/ssl/worker03-key.pem \
--node-labels=node-role.kubernetes.io/node=true

I believe to have set all cni-related config-options correctly.

By putting the custom 10-flannel.conf onto the node im bypassing calico completely? That means my setup is solely using the flannel/vxlan for networking even tough i applied the canal.yaml?

@felskrone yep, that's correct. You've essentially just installed flannel :)

So im guessing '_10-calico.conf' is getting picked up but is not working properly for me?

Yeah, the "no plugin name provided" error sounds very much like you're using a version of Kubernetes that does not support CNI plugin lists, which the latest canal manifests use.

The kubelet in older versions of k8s will only filter in files ending with .conf, which is why you were seeing the "no configuration" error earlier. By renaming it, you've made it so the kubelet can discover the file, however the contents of the file are still only valid for a later version of k8s.

What does kubectl version give back?

This outlines the different manifests to apply based on your kubernetes version: https://github.com/projectcalico/canal/tree/master/k8s-install#kubernetes-self-hosted-install

Im running hyperkube at version

# kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-14T06:36:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-14T06:36:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

and have used this file canal.yaml which is supposed to work with my k8s version.

@felskrone interesting, I wonder if the kubelet is also at that version (I'd expect v1.7.6 to work).

Could you try checking the kubelet version as well by passing it the --version flag?

Its all the same version because im just symlinking all executables to hyperkube.

worker03:~# kubelet --version
Kubernetes v1.7.6
worker03:~# which kubelet
/usr/local/bin/kubelet
worker03:~# ls -l /usr/local/bin/kubelet
lrwxrwxrwx 1 root root 24 2017-09-22 13:12 /usr/local/bin/kubelet -> /usr/local/bin/hyperkube
worker03:~# /usr/local/bin/hyperkube --version
Kubernetes v1.7.6

worker03:~# ls -l /usr/local/bin/
total 398936
-rwxr-xr-x 1 root root  79581168 2017-09-25 13:53 crio
-rwxr-xr-x 1 root root  34310103 2017-09-25 13:53 crioctl
-rwxr-x--x 1 root root 224972808 2017-09-22 13:12 hyperkube
-rwxr-xr-x 1 root root  56158936 2017-09-25 13:53 kpod
lrwxrwxrwx 1 root root        24 2017-09-22 13:12 kube-apiserver -> /usr/local/bin/hyperkube
lrwxrwxrwx 1 root root        24 2017-09-22 13:12 kube-controller-manager -> /usr/local/bin/hyperkube
lrwxrwxrwx 1 root root        24 2017-09-22 13:12 kubectl -> /usr/local/bin/hyperkube
lrwxrwxrwx 1 root root        24 2017-09-22 13:12 kubelet -> /usr/local/bin/hyperkube
lrwxrwxrwx 1 root root        24 2017-09-22 13:12 kube-proxy -> /usr/local/bin/hyperkube
lrwxrwxrwx 1 root root        24 2017-09-22 13:12 kube-scheduler -> /usr/local/bin/hyperkube
-rwxr-xr-x 1 root root  13473904 2017-09-25 13:53 runc

PS: Just updated to 1.7.10, but the problem persists.
PPS: Also tried standalone kubelet binary, but no go.

worker03:/usr/local/bin# kubelet --version
Kubernetes v1.7.10

Any update? I'd really prefer routing over flannel/vxlan :)

tmjd commented

@felskrone Did you ever make any progress with your setup?

@tmjd Nope, i had to work around it by supplying my own config (its saltstack-managed, so no big problem). Have not tried any newer versions for a while though.

tmjd commented

I'm going to close this issue for now. If you (or anyone) want to try with newer versions and dig into this again we can re-open or a new issue can be created.