kubernetes/kubernetes

Traffic loss with externalTrafficPolicy:Local and proxy-mode=ipvs

Closed this issue · 9 comments

What happened:

When a service with externalTrafficPolicy:Local is used with proxy-mode=ipvs only 1/nodes traffic gets through. The service have "type: LoadBalancer" and the loadBalancerIP is used for access.

If the service is accessed through the NodePort it works as expected.

The problem is that kube-proxy setup targets for all enpoints in ipvs for the loadBalancerIP while still disable the SNAT. So all traffic that does not happen to hit the own node is lost. Example;

> ipvsadm -L -n   # (narrowed)
TCP  192.168.1.4:32713 rr
  -> 11.0.4.3:8080                Masq    1      0          0         
TCP  10.0.0.0:8080 rr
  -> 11.0.1.2:8080                Masq    1      0          0         
  -> 11.0.2.2:8080                Masq    1      0          0         
  -> 11.0.3.2:8080                Masq    1      0          0         
  -> 11.0.4.3:8080                Masq    1      0          0         

Here the lbIP 10.0.0.0 gets all endpoints as target (wrong!) but the NotePort entry (32713) gets only the local endpoint as target (correct!).

The service manifest;

apiVersion: v1
kind: Service
metadata:
  name: mconnect-local
spec:
  selector:
    app: mconnect
  ports:
  - name: mconnect
    port: 5001
  - name: http
    port: 8080
  externalTrafficPolicy: Local
  type: LoadBalancer

metallb is used to obtain the lbIP.

What you expected to happen:

The targets for the loadBalancerIP should be the pods executing on the local node only, same as for NodePort and no traffic should be lost when using the loadBalancerIP.

How to reproduce it (as minimally and precisely as possible):

  • Use k8s with proxy-mode=ipvs
  • Start a service with externalTrafficPolicy:Local
  • Try to access the servcie via the loadBalancerIP
  • Do ipvsadm -L -n on a node and compare entries for loadBalancerIP and NodePort

Anything else we need to know?:

Public clouds using an external LB that distributes to the NodePort will not see this bug.

I usually runs on master and the problem exist on v1.14.0-alpha.0 I backed to 1.12 to see if it was a new problem.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T17:05:32Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-27T16:55:41Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    No cloud provider. xcluster on Ubuntu 18.04
  • OS (e.g. from /etc/os-release):
    Xcluster
  • Kernel (e.g. uname -a):
    Linux vm-004 4.18.5 #1 SMP Fri Nov 30 08:47:18 CET 2018 x86_64 GNU/Linux
  • Install tools:
  • Others:
    Test using mconnect

/kind bug

/sig network
/area ipvs

@uablrek

Could you please check out the HEAD and then give us the test result?

On version;

Client Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0-alpha.0.789+dde084fc557aa1", GitCommit:"dde084fc557aa16b5142d76bc62f10dcf2383e73", GitTreeState:"clean", BuildDate:"2018-12-03T07:44:35Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"linux/amd64"}

The problem still persists;

TCP  10.0.0.0:5001 rr
  -> 11.0.1.2:5001                Masq    1      6          0
  -> 11.0.2.3:5001                Masq    1      6          0 
  -> 11.0.3.2:5001                Masq    1      7          0
  -> 11.0.4.2:5001                Masq    1      0          7

Above it an ipvsadm -L -n from one node after a test. 7 connects are OK (the local ones) and the others lost.

Seems you are right but I am not sure if it will break other e2e tests. I remember there is a hack for LB type service? I think we can merge the fix if we can confirm that no e2e tests will be affected. Unfortunately, many service/network related e2e tests are not pre-submitted.

@Lion-Wei

@m1093782566 This is in code only used for proxy-mode=ipvs, is it possible to limit the e2e tests to that case? Not final, but to check it e2e tests are broken a bit faster.

#66064 fixed this for Node IPs, not sure for loadBalancerIP though
@uablrek can you confirm that accessing the service on NodeIP:Nodeport works but LbIP:Nodeport doesn't?

OK I think the issue is here: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L932
This one is an easy fix

Also, I'm not sure of the logic there: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L1032
It seems to me that if externalTrafficPolicy is local we should only include local endpoints regardless of sessionAffinity. @m1093782566 and @Lion-Wei what do you think? (I may have completely missed something here)

@lbernail

Please check/review the pending PR at; #71610

I only considered your second reference.