flannel-io/flannel

Kubernetes 1.12 and flannel does not work out of the box

outcoldman opened this issue Β· 32 comments

Seems like a new behavior with kubeadm, after I created a master, I see two taints on the master node:

Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule

But https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml has toleration only to

- key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

I added a toleration to kube-flannel.yml to solve the issue:

      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoSchedule

Expected Behavior

The docs should work with flannel out of the box
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

Current Behavior

Possible Solution

Maybe instead it should use a toleration without a key?

tolerations:
        - effect: NoSchedule
          operator: Exists

Steps to Reproduce (for bugs)

  1. Bootstrap master node with kubeadm
  2. Apply as suggested https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml from the docs.

Context

Your Environment

  • Flannel version: v0.10.0
  • Backend used (e.g. vxlan or udp):
  • Etcd version:
  • Kubernetes version (if used): 1.12
  • Operating System and version: Linux master1 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux, "Ubuntu 16.04.5 LTS"
  • Link to your project (optional):

I can confirm as wellβ€”on 1.11.3 the configuration applies correctly. On 1.12.0 it does not.

usign the toleration without a key worked for me. Would this be the solution?

usign the toleration without a key worked for me. Would this be the solution?

That sounds fine to me - flannel should probably tolerate all NoSchedule taints, since it's a critical piece of infrastructure.

Anyone want to submit a PR?

@caseydavenport I have submitted PR against master https://github.com/coreos/flannel/pull/1045/files

But it will be good to have the same fix for the tag v0.10.0, considering that in a lot of places there is a reference to this path https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml

Considering that this is just a configuration change, maybe make a release v0.10.1 and update the Kubernetes documentation?

thanks @outcoldman. it helps :)

thanks @outcoldman ! it works like a charm. ;)

Flannel should probably set

tolerations:
        - operator: Exists

as the default tolerations set. This will ensure that the flannel ds tolerates all taints.

For anyone willing to test the flannel fix for 1.12 ,
kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

For anyone willing to test the flannel fix for 1.12 ,
kubeadm -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

#trying on a pi2 b+ master
`HypriotOS/armv7: root@piNode01 in ~

kubeadm -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

Error: unknown command "apply" for "kubeadm"
Run 'kubeadm --help' for usage.
error: unknown command "apply" for "kubeadm"
`

So that got me closer, but still no dice, here is the docker output the apiserver container seems no bueno. Sorry I'm struggling with text formatting so here is the screenshot.
kubectl_apply_1 12

Hi @NerdyShawn,

I don't think you've got your kubectl configured correctly to connect to your cluster. As it seems like @rberg2 has managed to get this working, maybe it would be good to continue this on one of the support channels like slack rather than this issue.

Sorry, it was a typo, it's kubectl.

For those interested, k8s 1.12 deployment with all the goodies (ingress, dashboard, optional vsphere*, etc) automated with ansible and maintained here: github.com/ReSearchITEng/kubeadm-playbook/
The above has been scripted there as well.

@ReSearchITEng, confirm works (1.12.1).
The link to ansible playbook is broken.

Hello,
Even with the tolerations, it still says , i used the below link to run the flannel

https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

Please find the output of the pods:-

[user@darshan-p-hegde-89ca8c531 ~]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-9r27x 0/1 ContainerCreating 0 6m
coredns-576cbf47c7-qc4tm 0/1 ContainerCreating 0 6m
etcd-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 4m54s
kube-apiserver-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 5m2s
kube-controller-manager-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 5m2s
kube-flannel-ds-amd64-gm5z7 0/1 CrashLoopBackOff 5 4m56s
kube-proxy-mbtcj 1/1 Running 0 6m
kube-scheduler-darshan-p-hegde-89ca8c531.mylabserver.com 1/1 Running 0 5m13s

I have described the flannel pod and and the output is below:-

Name: kube-flannel-ds-amd64-gm5z7
Namespace: kube-system
Priority: 0
PriorityClassName:
Node: darshan-p-hegde-89ca8c531.mylabserver.com/172.31.42.12
Start Time: Sun, 07 Oct 2018 06:37:31 +0000
Labels: app=flannel
controller-revision-hash=6697bf5fc6
pod-template-generation=1
tier=node
Annotations:
Status: Running
IP: 172.31.42.12
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://b085e4a7d80b26730dc795d4a72b8a278ddc4ba71e5c463bfcd0172b793de349
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID: docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
Port:
Host Port:
Command:
cp
Args:
-f
/etc/kube-flannel/cni-conf.json
/etc/cni/net.d/10-flannel.conflist
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 07 Oct 2018 06:37:33 +0000
Finished: Sun, 07 Oct 2018 06:37:33 +0000
Ready: True
Restart Count: 0
Environment:
Mounts:
/etc/cni/net.d from cni (rw)
/etc/kube-flannel/ from flannel-cfg (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-llwn4 (ro)
Containers:
kube-flannel:
Container ID: docker://a8096a56009a0566b53e4b0aac09430b75120979e63dbe32eb8ed91053666a77
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID: docker-pullable://quay.io/coreos/flannel@sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
Port:
Host Port:
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 07 Oct 2018 06:43:46 +0000
Finished: Sun, 07 Oct 2018 06:43:48 +0000
Ready: False
Restart Count: 6
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: kube-flannel-ds-amd64-gm5z7 (v1:metadata.name)
POD_NAMESPACE: kube-system (v1:metadata.namespace)
Mounts:
/etc/kube-flannel/ from flannel-cfg (rw)
/run from run (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-llwn4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
run:
Type: HostPath (bare host directory volume)
Path: /run
HostPathType:
cni:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
flannel-cfg:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-flannel-cfg
Optional: false
flannel-token-llwn4:
Type: Secret (a volume populated by a Secret)
SecretName: flannel-token-llwn4
Optional: false
QoS Class: Guaranteed
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: :NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message


Normal Scheduled 6m57s default-scheduler Successfully assigned kube-system/kube-flannel-ds-amd64-gm5z7 to darshan-p-hegde-89ca8c531.mylabserver.com
Normal Pulling 6m57s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com pulling image "quay.io/coreos/flannel:v0.10.0-amd64"
Normal Pulled 6m55s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Successfully pulled image "quay.io/coreos/flannel:v0.10.0-amd64"
Normal Created 6m55s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Created container
Normal Started 6m55s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Started container
Normal Started 6m5s (x4 over 6m53s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Started container
Normal Pulled 5m11s (x5 over 6m54s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Container image "quay.io/coreos/flannel:v0.10.0-amd64" already present on machine
Normal Created 5m11s (x5 over 6m53s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Created container
Warning BackOff 105s (x23 over 6m48s) kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Back-off restarting failed container

Please find the output of the coreos pods:-

Warning FailedCreatePodSandBox 7m50s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5f6770d9dfcb53738a0dd428b86e815d4d85e9b71a76d17b10b1f764f102fb61" network for pod "coredns-576cbf47c7-9r27x": NetworkPlugin cni failed to set up pod "coredns-576cbf47c7-9r27x_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 7m49s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "009e9e0099f993086300649a89995a28a0fdf1a128863f7a71e3ff1973788c26" network for pod "coredns-576cbf47c7-9r27x": NetworkPlugin cni failed to set up pod "coredns-576cbf47c7-9r27x_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 7m48s kubelet, darshan-p-hegde-89ca8c531.mylabserver.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ea0ddaf5c411dd026cfd23366e49424526b7cc547652ca262a346f4c800f0c04" network for pod "coredns-576cbf47c7-9r27x": NetworkPlugin cni failed to set up pod "coredns-576cbf47c7-9r27x_kube-system" network: open /run/flannel/subnet.env: no such file or directory

@hegdedarsh possible that it is a different problem, but I would suggest using a released version https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml, modify the tolerations and give it a try.

This fixes the issue for me. Thanks for the PR!

Adding the toleration in the Flannel yaml works for me also. Tested on v1.12.1 Kubernetes. Thanks.

I am using the yaml file recommended in this issue. But for me nodePort and "externalIps" doesn't work anymore unless its from the same node that the pods are located on. If i try to telnet via the master ip i get a timeout.
this is since the upgrade to kubernetes 1.12.

Is this a problem with flannel?

I am on a fresh install of k8s 1.12 and have just tried downloading v0.10 and the tolerations seem to exist already. So I applied the yml

      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule

It tried creating the flannel pod but came up with 'Error' and eventually "CrashLoopBackOff".
Still very new to k8s. Any debug I can provide let me know.

just here to say that using https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml with the toleration's set as the below works on Kubernetes 1.12.3 with kubeadm install:

      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoSchedule

Thanks . it worked for me after applying above changes to flannel config to v1.12.3.

There hasn't been a release of flannel for a year and we need to upgrade to Kubernetes 1.12.

Are there plans to have a new release anytime soon? If not, it's not a problem, we can always branch and fix it ourselves.

Thanks

There hasn't been a release of flannel for a year and we need to upgrade to Kubernetes 1.12.

Are there plans to have a new release anytime soon? If not, it's not a problem, we can always branch and fix it ourselves.

Thanks

There is a release planned soon. Can we have a PR that updates kube-flannel.yml with the correct tolerations?

Thanks!

Wasn't it fixed here? 13a990b

I can certify that with the latest release v0.11.0, flannel works with kubernetes 1.12.5 out of the box :)

Thanks!

Wasn't it fixed here? 13a990b

Yes, although you must know the commit to fetch the fixed manifest. Typically, I obtain the manifest by using the tag, e.g. for v0.10.0, I use

https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml

Of course, the manifest does not include the fix, since it is the manifest that existed when v0.10.0 was released.

I humbly ask the maintainers to consider making fixes like this easier to find. πŸ™‚

(In my experience, a common way to make such fixes easy to find is to cherry-pick them to a release branch. I realize the flannel repo does not use release branches. I don't have insight into why that's the case.)

For anyone who wants to patch the v0.10.0 DaemonSet to tolerate all taints with the NoSchedule effect:

kubectl -nkube-system patch ds kube-flannel-ds --patch='{"spec":{"template":{"spec":{"tolerations":[{"effect":"NoSchedule","operator":"Exists"}]}}}}'

I strongly disagree that flannel should tolerate all taints, because there are nodes it should certainly not run on, (e.g. windows nodes).

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.