Pod unable to reach itself through a service (unless --cni=true is set)

Question

Pod unable to reach itself through a service (unless --cni=true is set)

arrawatia opened this issue 8 years ago · 58 comments

Minikube version (use minikube version):
minikube version: v0.16.0 and k8s version v1.6.4
(But I tried v0.17.1 and v0.19.1 too)
Environment:

OS (e.g. from /etc/os-release):
VM Driver :
Virtualbox
ISO version :
minikube-v1.0.6.iso
Install tools:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.16.0/minikube-darwin-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
Others:

What happened:
If a pod has a service which points to the pod, the pod cannot reach itself through the service IP. Other pods can reach the service and the pod itself can reach other services. This means all components (especially clustered & distributed systems) which expect to talk to themselves for leader election fail to startup properly.

What you expected to happen:
I expect the pod to be able to reach itself.

How to reproduce it (as minimally and precisely as possible):
It happens with all our services and pods but I can reproduce it with kube-system pods too.

Get service IP : kubectl describe svc kube-dns --namespace kube-system | grep IP:. I get 10.0.0.10
Get endpoint IP: kubectl describe svc kube-dns --namespace kube-system | grep Endpoints. I get 172.17.0.3

Exec into the pod:
kubectl --namespace kube-system exec -it kube-dns-v20-54536 sh

Run the following :
Using the service IP hangs

Name: kubernetes-dashboard.kube-system.svc.cluster.local
Address 1: 10.0.0.212 kubernetes-dashboard.kube-system.svc.cluster.local
/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local 10.0.0.10
Server: 10.0.0.10
^C

Using the endoint IP works

/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local 172.17.0.3
Server: 172.17.0.3
Address 1: 172.17.0.3 kube-dns-v20-54536

Name: kubernetes-dashboard.kube-system.svc.cluster.local
Address 1: 10.0.0.212 kubernetes-dashboard.kube-system.svc.cluster.local

Accessing a different service IP works. Using the kubernetes-dashboard IP from the last command
/ # telnet 10.0.0.212 80
get
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Connection: close

400 Bad RequestConnection closed by foreign host

Anything else do we need to know:
minikube v0.17.1 works with K8S 1.5.3
I tried the following and it worked. So, I suspect it has something to do with upgrading minikube to v0.17.1 and K8S to v1.6.4

curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.17.1/minikube-darwin-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
rm -rf ~/.minikube
minikube start --kubernetes-version 1.5.3 --cpus 4 --memory 6096 --v=8 --logtostderr

r2d4 commented 7 years ago

ref #1742

Answer 1 · 2017-06-15T08:37:07.000Z

It seems it's related to this: kubernetes/kubernetes#19930 and this kubernetes/kubernetes#20475

Answer 2 · 2017-06-15T09:08:59.000Z

For me, this helped to fix it kubernetes/kubernetes#20475 (comment)

So you can do:

minikube ssh
sudo ip link set docker0 promisc on

Maybe this fix can be merged directly in minikube so people won;t need to do custom things?

Answer 3 · 2017-06-15T23:10:07.000Z

@ursuad It worked for me too (minikube 0.19.1 and k8s v1.6.4). Thanks a lot for your help. :)

Answer 4 · 2017-06-20T11:39:51.000Z

@arrawatia No problem, but maybe you should leave this issue open, so we'll have a longer term fix merged in minkube.
This is a bug, and my fix is just a workaround.

Answer 5 · 2017-06-22T06:07:07.000Z

Not sure if there is a page to capture minikube gotchas. @ursuad's suggestion should be there

Answer 6 · 2017-06-26T14:25:50.000Z

Use Headless Service as a workaround (clusterIP: None). StatefulSets that talk to themselves like Kafka use headless services anyway.
This might still be considered a bug though.

Answer 7 · 2017-06-26T15:05:49.000Z

@kubernetes/sig-minikube

Answer 8 · 2017-06-26T16:41:35.000Z

I'll take a look at this

Answer 9 · 2017-07-13T01:32:32.000Z

@r2d4 I am pretty sure this is same as this #1690

Answer 10 · 2017-07-13T01:33:27.000Z

@arrawatia can you try with minikube with k8s 1.5.1 I am almost certain it will work in 1.5.1
I have that problem ! #1690

Answer 11 · 2017-08-30T00:58:45.000Z

Seeing this issue on minikube 0.21.0 and kube 1.70 and 1.7.2

The workaround posted by @ursuad seems to solve the issue for me

Answer 12 · 2017-08-30T03:47:53.000Z

I'm going to open this up again, since we ended up reverting the kubenet change.

Answer 13 · 2017-09-07T20:47:33.000Z

Just updating the status for minikube 0.22.0 - the issue is still present

Answer 14 · 2017-11-08T21:03:43.000Z

Still seeing this in minikube v0.23.0

Answer 15 · 2017-12-13T16:26:39.000Z

... and v0.24.0...

Answer 16 · 2017-12-19T14:40:22.000Z

... and v0.24.1

Answer 17 · 2017-12-19T15:00:40.000Z

Below commands are NOT working for me. Still getting '0' for cat /sys/devices/virtual/net/docker0/brif/veth*/hairpin_mode

minikube ssh
sudo ip link set docker0 promisc on

Any other solution or why it is not working while for others it is? Thanks.

minikube: 0.24.1
Kubernetes: 1.8
Win 10 Pro x64
Virtualbox 5.1.30

Answer 18 · 2018-02-14T01:20:21.000Z

sudo ip link set docker0 promisc on
fixed the issue for me. Saw it on minikube version: v0.24.1

Answer 19 · 2018-05-15T01:50:50.000Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Answer 20 · 2018-06-14T02:37:25.000Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

Answer 21 · 2018-07-15T14:58:11.000Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Answer 22 · 2018-08-18T10:17:02.000Z

This is not solved, right? Saw it on minikube version: v0.28.2

Answer 23 · 2018-10-15T11:37:16.000Z

can this be reopened please?

Answer 24 · 2018-12-06T23:08:40.000Z

/reopen
/remove-lifecycle rotten

Answer 25 · 2018-12-06T23:08:48.000Z

@nyetwurk: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Answer 26 · 2018-12-06T23:17:28.000Z

@arrawatia Can you reopen this? Thanks.

Answer 27 · 2019-04-04T07:45:18.000Z

This issue is still there on the 1.0.0 version. To me this is a major issue. Shall I create another issue or can someone reopen this one ?

Answer 28 · 2019-04-04T15:38:58.000Z

Same. Auto-stale bots are a cancer, IMO

Answer 29 · 2019-04-09T08:12:47.000Z

the problem is still alive. Is surely a major issue.

Answer 30 · 2019-04-09T16:34:22.000Z

It seems that this was being tracked by a second subsequent bug (#2460) which I've de-duped into this one.

I agree that the behavior isn't as users would expect, and would be more than happy to review any PR's which address this. Help wanted!

Answer 31 · 2019-04-10T07:08:01.000Z

By the way the proposed workaround works :

minikube ssh
sudo ip link set docker0 promisc on

Answer 32 · 2019-04-10T08:08:09.000Z

@sgandon , yes, is workaround but is not a fix. I used as workaround an hostAliases in the deployment with the same name of the service (i use minikube in the development environment, so for my case, can be enough), also this workaround works, but is not a fix to the problem.

Answer 33 · 2019-04-10T08:17:48.000Z

@tstromberg thanks for resurrecting this.

Answer 34 · 2019-05-13T12:38:09.000Z

It seems that this was being tracked by a second subsequent bug (#2460) which I've de-duped into this one.

I agree that the behavior isn't as users would expect, and would be more than happy to review any PR's which address this. Help wanted!

Additionally I would like understand why the promiscuous workaround works (as it did for me too).
I've been looking at the detailed networking descriptions for docker and minikube and from what I understand everything (virtual interfaces veth and so on) is connected to the docker0 bridge and then iptable rules inside the minikube virtualbox VM 'routes' traffic on an IP level at least.

From my limited understanding there exists one NIC for minikube, w.r.t. docker and kubernetes, and that is provisioned by virtualbox (so the NIC itself is virtualized in some manner right) which leads to the conclusion that there exists one MAC address for the one NIC. So how can the promiscuous mode on the one NIC matter, as that mode deals with MAC filtering as far as I understand?

A small guess would be that it matters that the virtualized NIC is connected to a real NIC - but how can a command inside minikube affect the NIC on the host that the minikube VM is running on, as that seems to imply that the VM is able to change things on the host -> should be a big security no no.

There is something missing in my understanding and I would like to know why promiscuous mode makes things work.

Answer 35 · 2019-08-12T20:08:36.000Z

This feels like a dumb question, but is this fixed in Minikube now? (And if so, which branch/release?) I don't see it in the log, but perhaps there's some bundling with Istio that I'm not aware of?

Sorry for taking bandwidth, I'm just trying to figure this out.

Answer 36 · 2019-08-16T14:29:29.000Z

Encountered this bug on

minikube version: v1.3.1
commit: ca60a42

Answer 37 · 2019-09-20T20:01:41.000Z

Has anyone tried this in Minikube 1.4.0 yet?

Answer 38 · 2019-10-11T21:43:20.000Z

+1 to merge this

Answer 39 · 2019-11-05T04:50:39.000Z

@dgoldssfo Yup facing this issue in the Minikube 1.4.0.

Answer 40 · 2019-12-16T22:22:43.000Z

Based on what I know, configuring minikube to use a CNI fixes this. #6094 will likely solve this for the default case. Help wanted!

Answer 41 · 2020-01-20T09:17:08.000Z

@tstromberg I've tried to test this issue through knative/eventing#2039 enabling a CNI:

minikube start --network-plugin=cni --enable-default-cni

But it still doesn't work.

minikube version: v1.5.2
commit: 792dbf92a1de583fcee76f8791cff12e0c9440ad-dirty

Answer 42 · 2020-05-28T18:27:31.000Z

Based on this, I think minikube should always enable a CNI nowadays. We already do so if minikube start --driver=docker is used.

Answer 43 · 2020-06-03T12:16:02.000Z

Neither
minikube start --network-plugin=cni --enable-default-cni
nor
sudo ip link set docker0 promisc on
works.

minikube v1.10.1 on Ubuntu 18.04

Any other workarounds?

Answer 44 · 2020-06-03T17:13:02.000Z

We're considering always enabling CNI in minikube, particularly now that it supports multi-node. The other workaround is `minikube start --driver=docker`, for which we always enable CNI.

…

On Wed, Jun 3, 2020 at 5:16 AM Hrishikesh Barua ***@***.***> wrote: Neither minikube start --network-plugin=cni --enable-default-cni nor sudo ip link set docker0 promisc on works. minikube v1.10.1 on Ubuntu 18.04 Any other workarounds? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1568 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAYYMG6J57NM2PARMEJQ5DRUY5JFANCNFSM4DOSC5PQ> .

Answer 45 · 2020-06-03T17:19:40.000Z

We're considering always enabling CNI in minikube, particularly now that it supports multi-node. The other workaround is minikube start --driver=docker, for which we always enable CNI.
…
On Wed, Jun 3, 2020 at 5:16 AM Hrishikesh Barua @.***> wrote: Neither minikube start --network-plugin=cni --enable-default-cni nor sudo ip link set docker0 promisc on works. minikube v1.10.1 on Ubuntu 18.04 Any other workarounds? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1568 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYYMG6J57NM2PARMEJQ5DRUY5JFANCNFSM4DOSC5PQ .

Thank you - I'll try driver=docker.

Answer 46 · 2020-06-30T13:33:57.000Z

the same problem here 🤦

minikube version: v1.11.0

Answer 47 · 2020-08-30T22:46:00.000Z

Fixing this by default appears to incur a ~30% performance penalty for startup, which makes me quite wary of imposing it on the users who do not care about CNI.

At a minimum though, we should document that it is possible to now say --cni=true to make this work.

Answer 48 · 2020-11-18T02:58:53.000Z

Have tested this with docker driver, running a service of two pods with simple nginx binary. Able to get response using service dns name ..svc.cluster.local from the pods.
Since docker driver will enable cni by default, so using docker driver won't encounter this issue.

Answer 49 · 2020-11-20T17:26:38.000Z

Also tested on kvm driver, doesn't encounter this issue when ping <service_name>..svc.cluster.local.

Answer 50 · 2020-11-22T18:09:11.000Z

./out/minikube start

create nginx service

kubectl apply -f template/nginx.yaml
kubectl expose deployment my-nginx

ssh into pods

kubectl get pods
Output:
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nginx-5b56ccd65f-hpktn 1/1 Running 0 45s
default my-nginx-5b56ccd65f-zqc72 1/1 Running 0 45s

kubectl exec -it my-nginx-5b56ccd65f-hpktn -- /bin/bash
curl my-nginx.default.svc.cluster.local

<title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style>

Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.

Thank you for using nginx.

Answer 51 · 2020-11-22T18:09:51.000Z

Nginx template:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80

Answer 52 · 2020-11-24T21:43:04.000Z

thanks @azhao155 for confirming this issue doesnt happen anymore.

closing please feel free to re-open if still see this problem

Answer 53 · 2022-01-17T17:04:16.000Z

@medyagh @azhao155
I'm trying using driver=none but that don't work.
Could you help?

Answer 54 · 2022-01-17T17:06:41.000Z

@belfo : please open a new issue.

Answer 55 · 2022-02-18T10:47:49.000Z

I'm finding @azhao155's test works when the replicas are set to two but the test fails if I run
kubectl scale deploy/my-nginx --replicas=1 (or set the replicas manually in the YAML). It works if the service is headless. Is this the expected behaviour? I'm using the docker driver btw.

minikube version: v1.24.0
commit: 76b94fb3c4e8ac5062daf70d60cf03ddcc0a741b

Answer 56 · 2022-11-03T19:49:08.000Z

Still seeing this issue on Minikube 1.27 using hyperkit.

--cni=true set up the CNI, but still did not allow a Pod to connect to itself through a service.

sudo ip link set docker0 promisc in the Minikube VM did fix the behavior.

Answer 57 · 2023-06-07T15:56:52.000Z

Also seeing this on 1.28 with Hyper-V driver.
The workaround (sudo ip link set docker0 promisc on) fixes it, but is rather inconvenient.