Traffic loss with externalTrafficPolicy:Local and proxy-mode=ipvs
Closed this issue · 9 comments
What happened:
When a service with externalTrafficPolicy:Local
is used with proxy-mode=ipvs only 1/nodes traffic gets through. The service have "type: LoadBalancer" and the loadBalancerIP is used for access.
If the service is accessed through the NodePort
it works as expected.
The problem is that kube-proxy
setup targets for all enpoints in ipvs for the loadBalancerIP while still disable the SNAT. So all traffic that does not happen to hit the own node is lost. Example;
> ipvsadm -L -n # (narrowed)
TCP 192.168.1.4:32713 rr
-> 11.0.4.3:8080 Masq 1 0 0
TCP 10.0.0.0:8080 rr
-> 11.0.1.2:8080 Masq 1 0 0
-> 11.0.2.2:8080 Masq 1 0 0
-> 11.0.3.2:8080 Masq 1 0 0
-> 11.0.4.3:8080 Masq 1 0 0
Here the lbIP 10.0.0.0
gets all endpoints as target (wrong!) but the NotePort entry (32713
) gets only the local endpoint as target (correct!).
The service manifest;
apiVersion: v1
kind: Service
metadata:
name: mconnect-local
spec:
selector:
app: mconnect
ports:
- name: mconnect
port: 5001
- name: http
port: 8080
externalTrafficPolicy: Local
type: LoadBalancer
metallb is used to obtain the lbIP.
What you expected to happen:
The targets for the loadBalancerIP
should be the pods executing on the local node only, same as for NodePort and no traffic should be lost when using the loadBalancerIP
.
How to reproduce it (as minimally and precisely as possible):
- Use k8s with proxy-mode=ipvs
- Start a service with
externalTrafficPolicy:Local
- Try to access the servcie via the loadBalancerIP
- Do
ipvsadm -L -n
on a node and compare entries for loadBalancerIP and NodePort
Anything else we need to know?:
Public clouds using an external LB that distributes to the NodePort will not see this bug.
I usually runs on master
and the problem exist on v1.14.0-alpha.0
I backed to 1.12 to see if it was a new problem.
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T17:05:32Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T16:55:41Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
No cloud provider. xcluster on Ubuntu 18.04 - OS (e.g. from /etc/os-release):
Xcluster - Kernel (e.g.
uname -a
):
Linux vm-004 4.18.5 #1 SMP Fri Nov 30 08:47:18 CET 2018 x86_64 GNU/Linux
- Install tools:
- Others:
Test using mconnect
/kind bug
/sig network
/area ipvs
Could you please check out the HEAD and then give us the test result?
/cc @Lion-Wei
On version;
Client Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0-alpha.0.789+dde084fc557aa1", GitCommit:"dde084fc557aa16b5142d76bc62f10dcf2383e73", GitTreeState:"clean", BuildDate:"2018-12-03T07:44:35Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"linux/amd64"}
The problem still persists;
TCP 10.0.0.0:5001 rr
-> 11.0.1.2:5001 Masq 1 6 0
-> 11.0.2.3:5001 Masq 1 6 0
-> 11.0.3.2:5001 Masq 1 7 0
-> 11.0.4.2:5001 Masq 1 0 7
Above it an ipvsadm -L -n
from one node after a test. 7 connects are OK (the local ones) and the others lost.
Seems you are right but I am not sure if it will break other e2e tests. I remember there is a hack for LB type service? I think we can merge the fix if we can confirm that no e2e tests will be affected. Unfortunately, many service/network related e2e tests are not pre-submitted.
@m1093782566 This is in code only used for proxy-mode=ipvs, is it possible to limit the e2e tests to that case? Not final, but to check it e2e tests are broken a bit faster.
OK I think the issue is here: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L932
This one is an easy fix
Also, I'm not sure of the logic there: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L1032
It seems to me that if externalTrafficPolicy is local we should only include local endpoints regardless of sessionAffinity. @m1093782566 and @Lion-Wei what do you think? (I may have completely missed something here)