Pod unable to reach itself through a service (unless --cni=true is set)
arrawatia opened this issue ยท 58 comments
Minikube version (use minikube version
):
minikube version: v0.16.0 and k8s version v1.6.4
(But I tried v0.17.1 and v0.19.1 too)
Environment:
- OS (e.g. from /etc/os-release):
- VM Driver :
Virtualbox - ISO version :
minikube-v1.0.6.iso - Install tools:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.16.0/minikube-darwin-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
- Others:
What happened:
If a pod has a service which points to the pod, the pod cannot reach itself through the service IP. Other pods can reach the service and the pod itself can reach other services. This means all components (especially clustered & distributed systems) which expect to talk to themselves for leader election fail to startup properly.
What you expected to happen:
I expect the pod to be able to reach itself.
How to reproduce it (as minimally and precisely as possible):
It happens with all our services and pods but I can reproduce it with kube-system pods too.
Get service IP : kubectl describe svc kube-dns --namespace kube-system | grep IP:
. I get 10.0.0.10
Get endpoint IP: kubectl describe svc kube-dns --namespace kube-system | grep Endpoints
. I get 172.17.0.3
Exec into the pod:
kubectl --namespace kube-system exec -it kube-dns-v20-54536 sh
Run the following :
Using the service IP hangs
Name: kubernetes-dashboard.kube-system.svc.cluster.local
Address 1: 10.0.0.212 kubernetes-dashboard.kube-system.svc.cluster.local
/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local 10.0.0.10
Server: 10.0.0.10
^C
Using the endoint IP works
/ # nslookup kubernetes-dashboard.kube-system.svc.cluster.local 172.17.0.3
Server: 172.17.0.3
Address 1: 172.17.0.3 kube-dns-v20-54536
Name: kubernetes-dashboard.kube-system.svc.cluster.local
Address 1: 10.0.0.212 kubernetes-dashboard.kube-system.svc.cluster.local
Accessing a different service IP works. Using the kubernetes-dashboard IP from the last command
/ # telnet 10.0.0.212 80
get
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Connection: close
400 Bad RequestConnection closed by foreign host
Anything else do we need to know:
minikube v0.17.1 works with K8S 1.5.3
I tried the following and it worked. So, I suspect it has something to do with upgrading minikube to v0.17.1 and K8S to v1.6.4
curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.17.1/minikube-darwin-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
rm -rf ~/.minikube
minikube start --kubernetes-version 1.5.3 --cpus 4 --memory 6096 --v=8 --logtostderr
It seems it's related to this: kubernetes/kubernetes#19930 and this kubernetes/kubernetes#20475
For me, this helped to fix it kubernetes/kubernetes#20475 (comment)
So you can do:
minikube ssh
sudo ip link set docker0 promisc on
Maybe this fix can be merged directly in minikube so people won;t need to do custom things?
@ursuad It worked for me too (minikube 0.19.1 and k8s v1.6.4). Thanks a lot for your help. :)
@arrawatia No problem, but maybe you should leave this issue open, so we'll have a longer term fix merged in minkube.
This is a bug, and my fix is just a workaround.
Not sure if there is a page to capture minikube gotchas. @ursuad's suggestion should be there
Use Headless Service as a workaround (clusterIP: None
). StatefulSets that talk to themselves like Kafka use headless services anyway.
This might still be considered a bug though.
@kubernetes/sig-minikube
I'll take a look at this
@arrawatia can you try with minikube with k8s 1.5.1 I am almost certain it will work in 1.5.1
I have that problem ! #1690
Seeing this issue on minikube 0.21.0 and kube 1.70 and 1.7.2
The workaround posted by @ursuad seems to solve the issue for me
I'm going to open this up again, since we ended up reverting the kubenet change.
Just updating the status for minikube 0.22.0 - the issue is still present
Still seeing this in minikube v0.23.0
... and v0.24.0...
... and v0.24.1
Below commands are NOT working for me. Still getting '0' for cat /sys/devices/virtual/net/docker0/brif/veth*/hairpin_mode
minikube ssh
sudo ip link set docker0 promisc on
Any other solution or why it is not working while for others it is? Thanks.
minikube: 0.24.1
Kubernetes: 1.8
Win 10 Pro x64
Virtualbox 5.1.30
sudo ip link set docker0 promisc on
fixed the issue for me. Saw it on minikube version: v0.24.1
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
This is not solved, right? Saw it on minikube version: v0.28.2
can this be reopened please?
/reopen
/remove-lifecycle rotten
@nyetwurk: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
/remove-lifecycle rotten
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@arrawatia Can you reopen this? Thanks.
This issue is still there on the 1.0.0 version. To me this is a major issue. Shall I create another issue or can someone reopen this one ?
Same. Auto-stale bots are a cancer, IMO
the problem is still alive. Is surely a major issue.
It seems that this was being tracked by a second subsequent bug (#2460) which I've de-duped into this one.
I agree that the behavior isn't as users would expect, and would be more than happy to review any PR's which address this. Help wanted!
By the way the proposed workaround works :
minikube ssh
sudo ip link set docker0 promisc on
@sgandon , yes, is workaround but is not a fix. I used as workaround an hostAliases in the deployment with the same name of the service (i use minikube in the development environment, so for my case, can be enough), also this workaround works, but is not a fix to the problem.
@tstromberg thanks for resurrecting this.
It seems that this was being tracked by a second subsequent bug (#2460) which I've de-duped into this one.
I agree that the behavior isn't as users would expect, and would be more than happy to review any PR's which address this. Help wanted!
Additionally I would like understand why the promiscuous workaround works (as it did for me too).
I've been looking at the detailed networking descriptions for docker and minikube and from what I understand everything (virtual interfaces veth and so on) is connected to the docker0 bridge and then iptable rules inside the minikube virtualbox VM 'routes' traffic on an IP level at least.
From my limited understanding there exists one NIC for minikube, w.r.t. docker and kubernetes, and that is provisioned by virtualbox (so the NIC itself is virtualized in some manner right) which leads to the conclusion that there exists one MAC address for the one NIC. So how can the promiscuous mode on the one NIC matter, as that mode deals with MAC filtering as far as I understand?
A small guess would be that it matters that the virtualized NIC is connected to a real NIC - but how can a command inside minikube affect the NIC on the host that the minikube VM is running on, as that seems to imply that the VM is able to change things on the host -> should be a big security no no.
There is something missing in my understanding and I would like to know why promiscuous mode makes things work.
This feels like a dumb question, but is this fixed in Minikube now? (And if so, which branch/release?) I don't see it in the log, but perhaps there's some bundling with Istio that I'm not aware of?
Sorry for taking bandwidth, I'm just trying to figure this out.
Has anyone tried this in Minikube 1.4.0 yet?
+1 to merge this
@dgoldssfo Yup facing this issue in the Minikube 1.4.0.
Based on what I know, configuring minikube to use a CNI fixes this. #6094 will likely solve this for the default case. Help wanted!
@tstromberg I've tried to test this issue through knative/eventing#2039 enabling a CNI:
minikube start --network-plugin=cni --enable-default-cni
But it still doesn't work.
minikube version: v1.5.2
commit: 792dbf92a1de583fcee76f8791cff12e0c9440ad-dirty
Based on this, I think minikube should always enable a CNI nowadays. We already do so if minikube start --driver=docker
is used.
Neither
minikube start --network-plugin=cni --enable-default-cni
nor
sudo ip link set docker0 promisc on
works.
minikube v1.10.1 on Ubuntu 18.04
Any other workarounds?
We're considering always enabling CNI in minikube, particularly now that it supports multi-node. The other workaround is
minikube start --driver=docker
, for which we always enable CNI.
โฆ
On Wed, Jun 3, 2020 at 5:16 AM Hrishikesh Barua @.***> wrote: Neither minikube start --network-plugin=cni --enable-default-cni nor sudo ip link set docker0 promisc on works. minikube v1.10.1 on Ubuntu 18.04 Any other workarounds? โ You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1568 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYYMG6J57NM2PARMEJQ5DRUY5JFANCNFSM4DOSC5PQ .
Thank you - I'll try driver=docker.
the same problem here ๐คฆ
minikube version: v1.11.0
Fixing this by default appears to incur a ~30% performance penalty for startup, which makes me quite wary of imposing it on the users who do not care about CNI.
At a minimum though, we should document that it is possible to now say --cni=true
to make this work.
Have tested this with docker driver, running a service of two pods with simple nginx binary. Able to get response using service dns name ..svc.cluster.local from the pods.
Since docker driver will enable cni by default, so using docker driver won't encounter this issue.
Also tested on kvm driver, doesn't encounter this issue when ping <service_name>..svc.cluster.local.
./out/minikube start
create nginx service
kubectl apply -f template/nginx.yaml
kubectl expose deployment my-nginx
ssh into pods
kubectl get pods
Output:
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nginx-5b56ccd65f-hpktn 1/1 Running 0 45s
default my-nginx-5b56ccd65f-zqc72 1/1 Running 0 45s
kubectl exec -it my-nginx-5b56ccd65f-hpktn -- /bin/bash
curl my-nginx.default.svc.cluster.local
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.
Nginx template:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80
thanks @azhao155 for confirming this issue doesnt happen anymore.
closing please feel free to re-open if still see this problem
@belfo : please open a new issue.
I'm finding @azhao155's test works when the replicas are set to two but the test fails if I run
kubectl scale deploy/my-nginx --replicas=1
(or set the replicas manually in the YAML). It works if the service is headless. Is this the expected behaviour? I'm using the docker driver btw.
minikube version: v1.24.0
commit: 76b94fb3c4e8ac5062daf70d60cf03ddcc0a741b
Still seeing this issue on Minikube 1.27 using hyperkit.
--cni=true set up the CNI, but still did not allow a Pod to connect to itself through a service.
sudo ip link set docker0 promisc in the Minikube VM did fix the behavior.
Also seeing this on 1.28 with Hyper-V driver.
The workaround (sudo ip link set docker0 promisc on
) fixes it, but is rather inconvenient.