weaveworks/weave

DNS lookup timeouts due to races in conntrack

dcowden opened this issue Β· 137 comments

What happened?

We are experiencing random 5 second DNS timeouts in our kubernetes cluster.

How to reproduce it?

It is reproducible by requesting just about any in-cluster service, and observing that periodically ( in our case, 1 out of 50 or 100 times), we get a 5 second delay. It always happens in DNS lookup.

Anything else we need to know?

We believe this is a result of a kernel level SNAT race condition that is described quite well here:

https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02

The problem happens with non-weave CNI implementations, and is (ironically) not even a weave issue really. However, its becomes a weave issue, because the solution is to set a flag on the masquerading rules that are created, which are not in anyone's control except for weave.

What we need is the ability to apply the NF_NAT_RANGE_PROTO_RANDOM_FULLY flag on the masquerading rules that weave sets up. IN the above post, Flannel was in use, and the fix was there instead.

We searched for this issue, and didnt see that anyone had asked for this. We're also unaware of any settings that allow setting this flag today-- if that's possible, please let us know.

Whoa! Good job for finding that.

However:

The iptables tool doesn't support setting this flag

this might be an issue.

@bboreham my kernel networking Fu is weak, so I'm not even able to suggest any work arounds. I'm hoping others here have stronger Fu... Challenge proposed!

naysayers frequently make scary, handwavey stability arguments against container stacks. Usually I laugh in the face of danger, but this appears to be the first ever case I've seen in which a little known kernel level gotcha actually does create issues for containers that would otherwise be unlikely to surface

I just spent several hours trouble shooting this problem, ran into the same XING blog post and then this issue report which was opened while I was trouble shooting!

Anyway, I'm seeing the same issues reported in the XING blog. DNS 5 second delays and a lot of insert_failed counts from conntrack using weave 2.3.0.

cpu=0 found=8089 invalid=353025 ignore=1249480 insert=0 insert_failed=8042 drop=8042 early_drop=0 error=0 search_restart=591166

More details can be provided if needed.

@btalbot one workaround you might try is to set this option in resolv.conf:

options single-request-reopen

It is a workaround that will basically make glibc retry the lookup, which will work most of the time.

Another bandaid that helps is to change ndots from 5 (the default) to 3, which will generate far fewer requests to your dns servers ,and lessen the frequency.

The problem is that it's kind of a pain to force changes into resolve.conf. it's done with kubelet --resolve-conf option, but then you have to create the whole file yourself which stinks.

@bboreham it does appear that the patched iptables is available. Can weave use a patched iptables?

The easiest thing is to use an iptables from a released Apline package. From there it gets progressively harder.

(Sorry for closing/reopening - finger slipped)

BTW my top tip to reduce DNS requests is to put a dot at the end when you know the full address. Eg instead of example.com put example.com.. This means it will not go through the search path, reducing lookups by 5x in a typical Kubernetes install.

For an in-cluster address if you know the namespace you can construct the fqdn, e.g. servicename.namespacename.svc.cluster.local.

@bboreham great tip, I didn't know that one! Thanks

I did a little investigation on netfilter.org.
it appears that the iptables patch that adds --random-fully is in iptables v 1.6.2, released on 2/22/2018.

alpine:latest packages v 1.6.1, however alpine:edge packages v 1.6.2

For an in-cluster address if you know the namespace you can construct the fqdn, e.g. servicename.namespacename.svc.cluster.local.

This only works for some apps or resolvers. The bind tools honor that of course since that is a decades old syntax for bind's zone files. But any apps that try to fix an address or use a different resolver that trick doesn't work. Curl is a good example of that not working.

From inside an alpine container curl https://kubernetes/ will hit the api server of course but so does curl https://kubernetes./

in our testing, we have found that only the options single-request-reopen change actually addresses this issue. Its a band-aid-- but dns lookups are fast, so we get aberrations of like 100ms, not 5 seconds,w hich is acceptable for us.

Now we're trying to figure out how to inject that into resolv.conf on all the pods. Anyone know how to do that?

I found this hack in some other related github issues and it's working for me

apiVersion: v1
data:
  resolv.conf: |
    nameserver 1.2.3.4
    search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
    options ndots:3 single-request-reopen
kind: ConfigMap
metadata:
  name: resolvconf

Then in your affected pods and containers

        volumeMounts:
        - name: resolv-conf
          mountPath: /etc/resolv.conf
          subPath: resolv.conf
...

      volumes:
      - name: resolv-conf
        configMap:
          name: resolvconf
          items:
          - key: resolv.conf
            path: resolv.conf

@btalbot thanks for posting that. That would definitely work in a pinch!

we use kops for our cluster, and the this seems promising. But i'm still learning how it works

Experiencing the same issue here. 5s delays on every, single, DNS lookup, 100% of the time. Similarly, insert_failed does increase for each DNS query. The AAAA query, that happens a few cycles after the A query, gets dropped systematically (tcpdump: https://hastebin.com/banulayire.swift).

Mounting a resolv.conf by hand in every single pod of our infrastructure is untenable.
kubernetes/kubernetes#62764 attempts at adding the workaround as a default in Kubernetes, but the PR is unlikely to land. And even if it does, it won't be released for a good while.

Here is the flannel patch: https://gist.github.com/maxlaverse/1fb3bfdd2509e317194280f530158c98

@Quentin-M what k8s version are you using? I'm curious why it's 100% repeatable for some but intermittent for others.

Another method to inject resolve.conf change s would be a deployment initializer. I've been trying to avoid creating one, but it's beginning to seem inevitable that in an Enterprise environment you need a way to enforce various things on every launched workload in a central way.

I'm still investigating the use of kubelet --resolve-conf, but what I'm really worried about is that all this is just a bandaid..

The only actual fix is the iptables flag

brb commented

Has anyone tried installing and running iptables-1.6.2 from the alpine packages for edge on Alpine 3.7?

@brb i was wondering the same thing. It would be nice to make progress and get a PR ready in anticipation of availability of 1.6.2. My go Fu is too week to take a shot at making the fix, but I'm guessing the fix goes somewhere around expose.go?

If it were possible to create a frankenversion that has this fix, we could test it out.

brb commented

Has anyone tried installing and running iptables-1.6.2 from the alpine packages for edge on Alpine 3.7?

Just installed it with apk add iptables --update-cache --repository http://dl-3.alpinelinux.org/alpine/edge/main/. However, I cannot guarantee that we don't miss anything with iptables from edge on 3.7.

the fix goes somewhere around expose.go

Yes, you are right.

If it were possible to create a frankenversion that has this fix, we could test it out.

I've just created the weave-kube image with the fix for amd64 arch only and kernel >= 3.13 (https://github.com/weaveworks/weave/tree/issues/3287-iptables-random-fully). To use it, please change the image name of weave-kube to "brb0/weave-kube:iptables-random-fully" in DaemonSet of Weave.

@brb Score! that's awesome! we'll try this out asap!
We're currently using image weaveworks/weave-kube:2.2.0, via a kops cluster. Would this image interoperate ok with those?

brb commented

I can't think of anything which would prevent it from working.

Please let us know whether it works, thanks!

@dcowden Kubernetes 1.10.1, Container Linux 1688.5.3-1758.0.0, AWS VPCs, Weave 2.3.0, kube-proxy IPVS. My guess is that it depends how fast/stable your network is?

@dcowden

I'm still investigating the use of kubelet --resolve-conf, but what I'm really worried about is that all this is just a bandaid..

I have tried the other day, while it changed the resolv.conf of my static pods, all the other pods (with default dnsPolicy) were still based on what dns.go constructs. Note that the DNS options are written as a constant there. No possibility to get single-request-reopen without running your own compiled version of kubelet.

@brb Thanks! I haven't realized yesterday that the patched iptables was already in an Alpine release. My issue is surely still present and both insert_failed and drop are still increasing. I note however that there are two other MASQUERADE rules in place, that do not have --random-fully, so that might be why? I am no network expert by any means unfortunately.

# Setup by WEAVE too.
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

# Setup by both kubelet and kube-proxy, used to SNAT ports when querying services.
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE

-A WEAVE ! -s 172.16.0.0/16 -d 172.16.0.0/16 -j MASQUERADE --random-fully
-A WEAVE -s 172.16.0.0/16 ! -d 172.16.0.0/16 -j MASQUERADE --random-fully

@brb, i tried this out. I was able to upgrade successfully, but it didnt help my problems.

I think maybe i don't have it installed correctly, because my iptables rules do not show the fully-random flag anywhere.

Here's my daemonset ( annotations and stuff after the image omitted ):

dcowden@ubuntu:~/gitwork/kubernetes$ kc get ds weave-net -n kube-system -o yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  ...omitted annotations...
  creationTimestamp: 2017-12-21T16:37:59Z
  generation: 4
  labels:
    name: weave-net
    role.kubernetes.io/networking: "1"
  name: weave-net
  namespace: kube-system
  resourceVersion: "21973562"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-system/daemonsets/weave-net
  uid: 4dd96bf2-e66d-11e7-8b61-069a0a6ccd8c
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: weave-net
      role.kubernetes.io/networking: "1"
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        name: weave-net
        role.kubernetes.io/networking: "1"
    spec:
      containers:
      - command:
        - /home/weave/launch.sh
        env:
        - name: WEAVE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: weave-passwd
              name: weave-passwd
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: IPALLOC_RANGE
          value: 100.96.0.0/11
        - name: WEAVE_MTU
          value: "8912"
        image: brb0/weave-kube:iptables-random-fully
        ...more stuff...

The daemonset was updated ok. Here's the iptables rules i see on a host. I dont see --random-fully anywhere:

[root@ip-172-25-19-92 ~]# iptables --list-rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N KUBE-FIREWALL
-N KUBE-FORWARD
-N KUBE-SERVICES
-N WEAVE-IPSEC-IN
-N WEAVE-NPC
-N WEAVE-NPC-DEFAULT
-N WEAVE-NPC-INGRESS
-A INPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -j KUBE-FIREWALL
-A INPUT -j WEAVE-IPSEC-IN
-A FORWARD -o weave -m comment --comment "NOTE: this must go before \'-j KUBE-FORWARD\'" -j WEAVE-NPC
-A FORWARD -o weave -m state --state NEW -j NFLOG --nflog-group 86
-A FORWARD -o weave -j DROP
-A FORWARD -i weave ! -o weave -j ACCEPT
-A FORWARD -o weave -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m comment --comment "kubernetes forward rules" -j KUBE-FORWARD
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT ! -p esp -m policy --dir out --pol none -m mark --mark 0x20000/0x20000 -j DROP
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-SERVICES -d 100.65.65.105/32 -p tcp -m comment --comment "default/schaeffler-logstash:http has no endpoints" -m tcp --dport 9600 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31436 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 100.69.172.111/32 -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
-A WEAVE-IPSEC-IN -s 172.25.83.126/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.234/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.40/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.51.21/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.51.170/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.51.29/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.130/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-NPC -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -d 224.0.0.0/4 -j ACCEPT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
-A WEAVE-NPC -m set ! --match-set weave-local-pods dst -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-f(09:Q6gzJb~LE_pU4n:@416L dst -m comment --comment "DefaultAllow isolation for namespace: ops" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-jXXXW48#WnolRYPFUalO(fLpK dst -m comment --comment "DefaultAllow isolation for namespace: troubleshooting" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-E.1.0W^NGSp]0_t5WwH/]gX@L dst -m comment --comment "DefaultAllow isolation for namespace: default" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-0EHD/vdN#O4]V?o4Tx7kS;APH dst -m comment --comment "DefaultAllow isolation for namespace: kube-public" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-?b%zl9GIe0AET1(QI^7NWe*fO dst -m comment --comment "DefaultAllow isolation for namespace: kube-system" -j ACCEPT

I don't know what to try next.

@dcowden You need to make sure you are calling iptables 1.6.2, otherwise you will not see the flag. One solution is to run iptables from within the weave container. As for you, it did not help my issue, the first AAAA query still appears to be dropped. I am compiling kube-proxy/kubelet to add the fully-random flag there as well, but this is going to take a while.

@Quentin-M ah, ok right. I'll try that.

I have the same behavior-- i most commonly see the dropped packet on the first request, which is really odd.

@Quentin-M since you are using 1.10, it appears you could use dnsPolicy None and then provide the values, since you're using k8s 1.10. Are you trying to avoid that?

We're still using 1.8, so that's not an option for us.

@Quentin-M You can also custom DNS settings by:

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - xxxxx
    searches:
      - xxxxx
    options:
      - name: single-request-reopen
brb commented

@Quentin-M @dcowden

-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

I'm not aware of this rule, and based on grepping it seems that Docker inserts it.

As there are quite a few iptables rules, I need to understand your packet flow. Could you answer to the following questions:

  1. Does the problem occur when you try to request kube-dns from a pod via ClusterIP of the kube-dns Service?
  2. Could you run sudo iptables-save -c before requesting kube-dns and the same cmd after (ideally, the request should fail due to the timeout)?
  3. sudo tcpdump -i weave -w foo.pcap on the host running a client pod while doing the steps above.

@brb I'm sorry to be needing more help, but I'm still unable to see the --random-fully on my rules.

I ran iptables from within the weave container, and confirmed that i do have iptables v 1.6.2:


/home/weave # iptables 
iptables v1.6.2: no command specified
Try `iptables -h' or 'iptables --help' for more information.
/home/weave # iptables --list-rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N KUBE-FIREWALL
-N KUBE-FORWARD
-N KUBE-SERVICES
-N WEAVE-IPSEC-IN
-N WEAVE-NPC
-N WEAVE-NPC-DEFAULT
-N WEAVE-NPC-INGRESS
-A INPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -j KUBE-FIREWALL
-A INPUT -j WEAVE-IPSEC-IN
-A FORWARD -o weave -m comment --comment "NOTE: this must go before \'-j KUBE-FORWARD\'" -j WEAVE-NPC
-A FORWARD -o weave -m state --state NEW -j NFLOG --nflog-group 86
-A FORWARD -o weave -j DROP
-A FORWARD -i weave ! -o weave -j ACCEPT
-A FORWARD -o weave -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m comment --comment "kubernetes forward rules" -j KUBE-FORWARD
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT ! -p esp -m policy --dir out --pol none -m mark --mark 0x20000/0x20000 -j DROP
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-SERVICES -d 100.65.65.105/32 -p tcp -m comment --comment "default/schaeffler-logstash:http has no endpoints" -m tcp --dport 9600 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31436 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 100.69.172.111/32 -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
-A WEAVE-IPSEC-IN -s 172.25.51.145/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.80/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.234/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.81/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.213/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.117/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.243/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-NPC -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -d 224.0.0.0/4 -j ACCEPT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
-A WEAVE-NPC -m set ! --match-set weave-local-pods dst -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-E.1.0W^NGSp]0_t5WwH/]gX@L dst -m comment --comment "DefaultAllow isolation for namespace: default" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-0EHD/vdN#O4]V?o4Tx7kS;APH dst -m comment --comment "DefaultAllow isolation for namespace: kube-public" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-?b%zl9GIe0AET1(QI^7NWe*fO dst -m comment --comment "DefaultAllow isolation for namespace: kube-system" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-f(09:Q6gzJb~LE_pU4n:@416L dst -m comment --comment "DefaultAllow isolation for namespace: ops" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-jXXXW48#WnolRYPFUalO(fLpK dst -m comment --comment "DefaultAllow isolation for namespace: troubleshooting" -j ACCEPT
[root@ip-172-25-51-112 ~]# docker ps | grep weave
51b7fd4a2fb6        weaveworks/weave-npc@sha256:1d85c63e8b4cd433363d5527fdae263069d118308521490a9ea2d4b00b484a5e                                "/usr/bin/weave-npc"     8 hours ago         Up 8 hours                              k8s_weave-npc_weave-net-zj924_kube-system_3f526bf1-4dd6-11e8-93c1-06cef8be63fa_0
10fdaf3a91cd        brb0/weave-kube@sha256:84010a75a045b66cf79915b0c0bc44dce59692a30dbd6e80b00149301e5e9a4c                                     "/home/weave/launc..."   8 hours ago         Up 8 hours                              k8s_weave_weave-net-zj924_kube-system_3f526bf1-4dd6-11e8-93c1-06cef8be63fa_0
adef4ba0b8ea        gcr.io/google_containers/pause-amd64:3.0                                                                                    "/pause"                 8 hours ago         Up 8 hours                              k8s_POD_weave-net-zj924_kube-system_3f526bf1-4dd6-11e8-93c1-06cef8be63fa_0

My best guess is that the rules may not have been re-built when i updated the daemonset. I made sure the new image is running on all nodes, but all i did was to update the ds and watch it terminate and create all the new containers. maybe it didnt re-build the iptables rules on the underlying nodes?

@xiaoxubeii Yes, thanks for the hint, but I would like not to expect all the users to add this, to fix an infrastructure bug. I'd prefer to use kubernetes/kubernetes#62764 or use the fixed weave+kubelet+kube-proxy.

@Quentin-M absolutely, workarounds are not idea. Please report back when you've proven whether your kubelet fixes work-- i think you're actually working on a key, core k8s issue that most people probably experience but don't even know it.

brb commented

@dcowden No worries. You need to specify the "nat" table: iptables -t nat --list-rules or just run iptables-save. The latter will dump rules from all tables.

If the rules with the --random-fully are missing, then please restart the nodes.

@brb thank you.
I ran those commands, and I can verify that the --random-fully options are indeed on the weave rules:

-A WEAVE -s 100.96.0.0/11 -d 224.0.0.0/4 -j RETURN
-A WEAVE ! -s 100.96.0.0/11 -d 100.96.0.0/11 -j MASQUERADE --random-fully
-A WEAVE -s 100.96.0.0/11 ! -d 100.96.0.0/11 -j MASQUERADE --random-fully

But, as @Quentin-M i still have the problem, and i have about a trillion other iptables rules that dont have --random-fully from kubelet.

brb commented

@dcowden Could you answer to my questions posted above? That would help me to identify the exact rules which need the flag.

@brb, sorry, forgot about those. I'll get you those answers tomorrow. Thanks for the continued help!

hey @brb i'm going to do these tests for you today.

Does the problem occur when you try to request kube-dns from a pod via ClusterIP of the kube-dns Service?

Do you mean just performing any dns lookup, for example nslookup kubernetes

brb commented

Any DNS lookup which would trigger the problem.

hi @brb, ok i have some findings, but not all of your questions answered.

Does the problem occur when you try to request kube-dns from a pod via ClusterIP of the kube-dns Service?

Yes. In our cluster, the only nameserver available is the kube-dns. IE, on my test pod, resolv.conf looks like this:

[root@dc-debug-856bf6cd69-zfrvt ~]# more /etc/resolv.conf
nameserver 100.64.0.10
search default.svc.cluster.local svc.cluster.local cluster.local colinx.com
options ndots:5
  1. Could you run sudo iptables-save -c before requesting kube-dns and the same cmd after (ideally, the request should fail due to the timeout)?

  2. sudo tcpdump -i weave -w foo.pcap on the host running a client pod while doing the steps above.

Ok this is interesting. Before, my test uses curl to test an -incluster, url, and this creates failures about 1% of the time. For this test, i switched to running this instead:

time -p bash -c "for (( i=0; i<1000; i++ )); do dig ptplace-bff.default.svc.cluster.local; done;" | grep Query

My hope was that i could more readily duplicate the problem. But in fact, using dig vs curl makes it impossible to duplicate the problem, very odd!

So, regarding your last two questions, i'm still stuck trying to get a way to reliably duplicate the problem

@brb ok, I did #3.

In the packet capture where i caught it, the pod is looping doing a curl on a remote url ( which of course does a lookup).

The pod IP is 100.98.0.4. The two kube dns pods are 100.103.128 .12 and 100.98.0.7

We have a timeout at packet # 198562 ( t=55.332202 in the capture).

We are back to the normal query loop around frame 198945.

 198500 55.320275   100.98.0.4            100.98.0.7            DNS      97     Standard query 0x08f5  AAAA ptplace-bff.default.svc.cluster.local
 198501 55.320285   100.98.0.4            100.98.0.7            DNS      97     Standard query 0x08f5  AAAA ptplace-bff.default.svc.cluster.local
 198502 55.320326   100.98.0.7            100.98.0.4            DNS      113    Standard query response 0x68b9  A 100.69.135.10
 198503 55.320352   100.98.0.7            100.98.0.4            DNS      113    Standard query response 0x68b9  A 100.69.135.10
 198504 55.320376   100.98.0.7            100.98.0.4            DNS      97     Standard query response 0x08f5 
 198505 55.320397   100.98.0.7            100.98.0.4            DNS      97     Standard query response 0x08f5 
 198506 55.321508   100.98.0.4            100.98.0.8            TCP      74     49660β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121433404 TSecr=0 WS=512
 198507 55.321532   100.98.0.4            100.98.0.8            TCP      74     [TCP Out-Of-Order] 49660β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121433404 TSecr=0 WS=512
 198508 55.321547   100.98.0.8            100.98.0.4            TCP      74     8080β†’49660 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121433405 TSecr=121433404 WS=512
 198509 55.321552   100.98.0.8            100.98.0.4            TCP      74     [TCP Out-Of-Order] 8080β†’49660 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121433405 TSecr=121433404 WS=512
 198510 55.321573   100.98.0.4            100.98.0.8            TCP      66     49660β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121433405 TSecr=121433405
 198511 55.321587   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198510#1] 49660β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121433405 TSecr=121433405
 198512 55.321575   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198510#2] 49660β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121433405 TSecr=121433405
 198513 55.321598   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198510#3] 49660β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121433405 TSecr=121433405
 198514 55.321648   100.98.0.4            100.98.0.8            HTTP     216    GET /ptpr/api/labels?tag=home&tag=layout&tag=addresses HTTP/1.1 
 198515 55.321658   100.98.0.4            100.98.0.8            HTTP     216    [TCP Retransmission] GET /ptpr/api/labels?tag=home&tag=layout&tag=addresses HTTP/1.1 
 198516 55.321664   100.98.0.8            100.98.0.4            TCP      66     8080β†’49660 [ACK] Seq=1 Ack=151 Win=28160 Len=0 TSval=121433405 TSecr=121433405
 198517 55.321668   100.98.0.8            100.98.0.4            TCP      78     [TCP Dup ACK 198516#1] 8080β†’49660 [ACK] Seq=1 Ack=151 Win=28160 Len=0 TSval=121433405 TSecr=121433405 SLE=1 SRE=151
 198518 55.321946   100.98.0.8            172.25.81.195         PGSQL    110    >P/B/D/E/S
 198519 55.322188   172.25.81.195         100.98.0.8            PGSQL    92     <1/2/n/I/Z
 198520 55.322259   100.98.0.8            172.25.81.195         PGSQL    280    >B/E/S
 198521 55.322461   100.98.0.8            100.117.128.13        TCP      66     47486β†’11211 [ACK] Seq=471395 Ack=10683 Win=52 Len=0 TSval=121433406 TSecr=132228483
 198522 55.322702   172.25.81.195         100.98.0.8            PGSQL    91     <2/C/Z
 198523 55.323537   100.98.0.8            100.98.0.4            TCP      8258   [TCP segment of a reassembled PDU]
 198524 55.323567   100.98.0.4            100.98.0.8            TCP      66     49660β†’8080 [ACK] Seq=151 Ack=8193 Win=44544 Len=0 TSval=121433407 TSecr=121433407
 198525 55.323576   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198524#1] 49660β†’8080 [ACK] Seq=151 Ack=8193 Win=44544 Len=0 TSval=121433407 TSecr=121433407
 198526 55.323611   100.98.0.8            100.98.0.4            TCP      419    [TCP segment of a reassembled PDU]
 198527 55.323629   100.98.0.4            100.98.0.8            TCP      66     49660β†’8080 [ACK] Seq=151 Ack=8546 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198528 55.323640   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198527#1] 49660β†’8080 [ACK] Seq=151 Ack=8546 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198529 55.323806   100.98.0.8            100.98.0.4            HTTP     71     HTTP/1.1 200   (application/json)
 198530 55.323893   100.98.0.4            100.98.0.8            TCP      66     49660β†’8080 [ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198531 55.323904   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198530#1] 49660β†’8080 [ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198532 55.323987   100.98.0.8            100.117.128.13        MEMCACHE 427    set 068DA37F175F39210FF361B112FAD0DF-n2 2048 10794 303 
 198533 55.324008   100.98.0.4            100.98.0.8            TCP      66     49660β†’8080 [FIN, ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198534 55.324020   100.98.0.4            100.98.0.8            TCP      66     [TCP Out-Of-Order] 49660β†’8080 [FIN, ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198535 55.324034   100.98.0.8            100.98.0.4            TCP      78     8080β†’49660 [ACK] Seq=8551 Ack=152 Win=28160 Len=0 TSval=121433407 TSecr=121433407 SLE=151 SRE=152
 198536 55.324090   100.98.0.8            100.98.0.4            TCP      66     8080β†’49660 [FIN, ACK] Seq=8551 Ack=152 Win=28160 Len=0 TSval=121433407 TSecr=121433407
 198537 55.324109   100.98.0.4            100.98.0.8            TCP      66     49660β†’8080 [ACK] Seq=152 Ack=8552 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198538 55.324118   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198537#1] 49660β†’8080 [ACK] Seq=152 Ack=8552 Win=60928 Len=0 TSval=121433407 TSecr=121433407
 198539 55.324643   100.117.128.13        100.98.0.8            MEMCACHE 74     STORED 
 198540 55.324691   100.98.0.8            100.117.128.13        TCP      66     47486β†’11211 [ACK] Seq=471756 Ack=10691 Win=52 Len=0 TSval=121433408 TSecr=132228524
 198541 55.328316   100.98.0.4            100.98.0.7            DNS      123    Standard query 0x5d19  A ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198542 55.328350   100.98.0.4            100.98.0.7            DNS      123    Standard query 0x5d19  A ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198543 55.328372   100.98.0.4            100.98.0.7            DNS      123    Standard query 0xc81f  AAAA ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198544 55.328383   100.98.0.4            100.98.0.7            DNS      123    Standard query 0xc81f  AAAA ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198545 55.328577   100.98.0.7            100.98.0.4            DNS      216    Standard query response 0x5d19 No such name
 198546 55.328743   100.98.0.7            100.98.0.4            DNS      216    Standard query response 0xc81f No such name
 198547 55.328833   100.98.0.4            100.103.128.12        DNS      115    Standard query 0x989e  A ptplace-bff.default.svc.cluster.local.svc.cluster.local
 198548 55.328845   100.98.0.4            100.103.128.12        DNS      115    Standard query 0x989e  A ptplace-bff.default.svc.cluster.local.svc.cluster.local
 198549 55.328898   100.98.0.4            100.103.128.12        DNS      115    Standard query 0x5224  AAAA ptplace-bff.default.svc.cluster.local.svc.cluster.local
 198550 55.328909   100.98.0.4            100.103.128.12        DNS      115    Standard query 0x5224  AAAA ptplace-bff.default.svc.cluster.local.svc.cluster.local
 198551 55.330545   100.103.128.12        100.98.0.4            DNS      208    Standard query response 0x989e No such name
 198552 55.330548   100.103.128.12        100.98.0.4            DNS      208    Standard query response 0x5224 No such name
 198553 55.330679   100.98.0.4            100.103.128.12        DNS      111    Standard query 0x0e43  A ptplace-bff.default.svc.cluster.local.cluster.local
 198554 55.330698   100.98.0.4            100.103.128.12        DNS      111    Standard query 0x0e43  A ptplace-bff.default.svc.cluster.local.cluster.local
 198555 55.330737   100.98.0.4            100.103.128.12        DNS      111    Standard query 0xe867  AAAA ptplace-bff.default.svc.cluster.local.cluster.local
 198556 55.330753   100.98.0.4            100.103.128.12        DNS      111    Standard query 0xe867  AAAA ptplace-bff.default.svc.cluster.local.cluster.local
 198557 55.332057   100.103.128.12        100.98.0.4            DNS      204    Standard query response 0x0e43 No such name
 198558 55.332058   100.103.128.12        100.98.0.4            DNS      204    Standard query response 0xe867 No such name
 198559 55.332133   100.98.0.4            100.103.128.12        DNS      108    Standard query 0xf3da  A ptplace-bff.default.svc.cluster.local.colinx.com
 198560 55.332150   100.98.0.4            100.103.128.12        DNS      108    Standard query 0xf3da  A ptplace-bff.default.svc.cluster.local.colinx.com
 198561 55.332187   100.98.0.4            100.103.128.12        DNS      108    Standard query 0x4753  AAAA ptplace-bff.default.svc.cluster.local.colinx.com
 198562 55.332202   100.98.0.4            100.103.128.12        DNS      108    Standard query 0x4753  AAAA ptplace-bff.default.svc.cluster.local.colinx.com
 198563 55.353498   100.98.0.8            100.99.128.12         TCP      66     57650β†’11211 [ACK] Seq=513632 Ack=11619 Win=52 Len=0 TSval=121433437 TSecr=124895267
 198564 55.362483   100.98.0.8            172.25.81.195         TCP      66     36356β†’5432 [ACK] Seq=133131 Ack=26818 Win=2384 Len=0 TSval=121433446 TSecr=409147131
 198565 55.502318   100.117.128.3         100.98.0.7            DNS      104    Standard query 0x0473  A 0.datadog.pool.ntp.org.ops.svc.cluster.local
 198566 55.502871   100.98.0.7            100.117.128.3         DNS      197    Standard query response 0x0473 No such name
 198567 55.504785   100.117.128.3         100.98.0.7            DNS      96     Standard query 0xfc94  AAAA 0.datadog.pool.ntp.org.cluster.local
 198568 55.505016   100.98.0.7            100.117.128.3         DNS      189    Standard query response 0xfc94 No such name
 198569 55.531656   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198570 55.535372   100.64.0.1            100.98.0.2            TLSv1.2  130    Application Data
 198571 55.535507   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198572 55.535532   100.64.0.1            100.98.0.2            TLSv1.2  6306   Application Data
 198573 55.535538   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198574 55.535526   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1373 Ack=496019 Win=12271 Len=0 TSval=121433619 TSecr=135868614
 198575 55.535583   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1373 Ack=502297 Win=12271 Len=0 TSval=121433619 TSecr=135868614
 198576 55.535679   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198577 55.535735   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198578 55.536280   100.64.0.1            100.98.0.2            TCP      66     443β†’53076 [ACK] Seq=502297 Ack=1411 Win=1028 Len=0 TSval=135868615 TSecr=121433619
 198579 55.636583   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198580 55.639235   100.64.0.1            100.98.0.2            TLSv1.2  107    Application Data
 198581 55.639357   100.64.0.1            100.98.0.2            TLSv1.2  16479  Application Data
 198582 55.639382   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198583 55.639407   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198584 55.639424   100.64.0.1            100.98.0.2            TLSv1.2  4701   Application Data
 198585 55.639433   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198586 55.639452   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1503 Ack=532322 Win=12271 Len=0 TSval=121433723 TSecr=135868718
 198587 55.639560   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198588 55.639599   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198589 55.639637   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198590 55.640078   100.64.0.1            100.98.0.2            TCP      66     443β†’53076 [ACK] Seq=532322 Ack=1541 Win=1028 Len=0 TSval=135868719 TSecr=121433723
 198591 55.640097   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198592 55.640659   100.64.0.1            100.98.0.2            TCP      66     443β†’53076 [ACK] Seq=532322 Ack=1629 Win=1028 Len=0 TSval=135868719 TSecr=121433723
 198593 55.642282   100.64.0.1            100.98.0.2            TLSv1.2  107    Application Data
 198594 55.642436   100.64.0.1            100.98.0.2            TLSv1.2  15166  Application Data
 198595 55.642458   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198596 55.642533   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1675 Ack=547501 Win=12271 Len=0 TSval=121433726 TSecr=135868721
 198597 55.642612   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198598 55.642659   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198599 55.642887   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198600 55.643170   100.64.0.1            100.98.0.2            TCP      66     443β†’53076 [ACK] Seq=547501 Ack=1713 Win=1028 Len=0 TSval=135868722 TSecr=121433726
 198601 55.653343   100.64.0.1            100.98.0.2            TLSv1.2  107    Application Data
 198602 55.653476   100.64.0.1            100.98.0.2            TLSv1.2  16479  Application Data
 198603 55.653518   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198604 55.653529   100.64.0.1            100.98.0.2            TLSv1.2  16479  Application Data
 198605 55.653576   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198606 55.653587   100.64.0.1            100.98.0.2            TLSv1.2  16479  Application Data
 198607 55.653621   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198608 55.653660   100.64.0.1            100.98.0.2            TLSv1.2  17786  Application Data, Application Data
 198609 55.653680   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198610 55.653702   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1805 Ack=623475 Win=12172 Len=0 TSval=121433737 TSecr=135868732
 198611 55.653826   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198612 55.653857   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198613 55.653894   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198614 55.654411   100.64.0.1            100.98.0.2            TLSv1.2  17786  Application Data
 198615 55.654456   100.64.0.1            100.98.0.2            TLSv1.2  26646  Application Data
 198616 55.654466   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1931 Ack=641195 Win=12271 Len=0 TSval=121433738 TSecr=135868733
 198617 55.654481   100.64.0.1            100.98.0.2            TLSv1.2  26646  Application Data
 198618 55.654521   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1931 Ack=694355 Win=12212 Len=0 TSval=121433738 TSecr=135868733
 198619 55.654526   100.64.0.1            100.98.0.2            TLSv1.2  8926   Application Data
 198620 55.654611   100.64.0.1            100.98.0.2            TLSv1.2  8926   Application Data
 198621 55.654623   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1931 Ack=712075 Win=12271 Len=0 TSval=121433738 TSecr=135868733
 198622 55.654654   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198623 55.654749   100.64.0.1            100.98.0.2            TLSv1.2  8926   Application Data
 198624 55.654760   100.98.0.2            172.25.19.81          TCP      66     53076β†’443 [ACK] Seq=1931 Ack=729795 Win=12271 Len=0 TSval=121433738 TSecr=135868733
 198625 55.654797   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198626 55.654808   100.64.0.1            100.98.0.2            TCP      66     443β†’53076 [ACK] Seq=738655 Ack=1885 Win=1028 Len=0 TSval=135868733 TSecr=121433737
 198627 55.654820   100.98.0.2            172.25.19.81          TLSv1.2  276    Application Data, Application Data, Application Data, Application Data, Application Data
 198628 55.655079   100.64.0.1            100.98.0.2            TLSv1.2  7232   Application Data
 198629 55.655159   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198630 55.655689   100.64.0.1            100.98.0.2            TCP      66     443β†’53076 [ACK] Seq=745821 Ack=2137 Win=1028 Len=0 TSval=135868734 TSecr=121433738
 198631 55.970506   100.64.0.1            100.98.0.7            TLSv1.2  522    Application Data
 198632 55.970532   100.98.0.7            172.25.19.81          TCP      66     34358β†’443 [ACK] Seq=211 Ack=25567 Win=2072 Len=0 TSval=121434054 TSecr=135869049
 198633 56.079985   6e:3f:90:e5:38:2e     Broadcast             ARP      42     Who has 100.101.0.9?  Tell 100.117.128.10
 198634 56.372507   100.98.0.0            100.98.0.7            TCP      74     36618β†’10054 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121434456 TSecr=0 WS=512
 198635 56.372549   100.98.0.7            100.98.0.0            TCP      74     10054β†’36618 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121434456 TSecr=121434456 WS=512
 198636 56.372577   100.98.0.0            100.98.0.7            TCP      66     36618β†’10054 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121434456 TSecr=121434456
 198637 56.372682   100.98.0.0            100.98.0.7            HTTP     197    GET /healthcheck/dnsmasq HTTP/1.1 
 198638 56.372694   100.98.0.7            100.98.0.0            TCP      66     10054β†’36618 [ACK] Seq=1 Ack=132 Win=28160 Len=0 TSval=121434456 TSecr=121434456
 198639 56.372795   100.98.0.7            100.98.0.0            HTTP     244    HTTP/1.1 200 OK  (application/json)
 198640 56.372820   100.98.0.0            100.98.0.7            TCP      66     36618β†’10054 [ACK] Seq=132 Ack=179 Win=28160 Len=0 TSval=121434456 TSecr=121434456
 198641 56.372864   100.98.0.7            100.98.0.0            TCP      66     10054β†’36618 [FIN, ACK] Seq=179 Ack=132 Win=28160 Len=0 TSval=121434456 TSecr=121434456
 198642 56.372884   100.98.0.0            100.98.0.7            TCP      66     36618β†’10054 [FIN, ACK] Seq=132 Ack=180 Win=28160 Len=0 TSval=121434456 TSecr=121434456
 198643 56.372895   100.98.0.7            100.98.0.0            TCP      66     10054β†’36618 [ACK] Seq=180 Ack=133 Win=28160 Len=0 TSval=121434456 TSecr=121434456
 198644 56.852484   e2:ee:38:bf:90:8e     be:f3:86:69:94:76     ARP      42     Who has 100.98.0.0?  Tell 100.98.0.6
 198645 56.852506   be:f3:86:69:94:76     e2:ee:38:bf:90:8e     ARP      42     100.98.0.0 is at be:f3:86:69:94:76
 198646 57.013894   100.64.0.1            100.98.0.7            TLSv1.2  540    Application Data
 198647 57.013921   100.98.0.7            172.25.19.81          TCP      66     34358β†’443 [ACK] Seq=211 Ack=26041 Win=2072 Len=0 TSval=121435097 TSecr=135870092
 198648 57.237911   100.98.0.2            52.202.168.18         TLSv1.2  701    Application Data
 198649 57.246563   52.202.168.18         100.98.0.2            TLSv1.2  339    Application Data
 198650 57.246634   100.98.0.2            52.202.168.18         TCP      66     59084β†’443 [ACK] Seq=44206 Ack=1912 Win=1030 Len=0 TSval=121435330 TSecr=768312116
 198651 57.246976   100.98.0.2            52.202.168.18         TLSv1.2  4191   Application Data
 198652 57.247024   100.98.0.2            52.202.168.18         TLSv1.2  4191   Application Data
 198653 57.247052   100.98.0.2            52.202.168.18         TLSv1.2  4191   Application Data
 198654 57.250658   52.202.168.18         100.98.0.2            TCP      66     443β†’59084 [ACK] Seq=1912 Ack=46940 Win=422 Len=0 TSval=768312117 TSecr=121435330
 198655 57.250913   52.202.168.18         100.98.0.2            TCP      66     443β†’59084 [ACK] Seq=1912 Ack=48331 Win=422 Len=0 TSval=768312117 TSecr=121435330
 198656 57.250956   52.202.168.18         100.98.0.2            TCP      66     443β†’59084 [ACK] Seq=1912 Ack=52456 Win=422 Len=0 TSval=768312117 TSecr=121435330
 198657 57.250946   100.98.0.2            52.202.168.18         TLSv1.2  2268   Application Data
 198658 57.251998   52.202.168.18         100.98.0.2            TCP      66     443β†’59084 [ACK] Seq=1912 Ack=56581 Win=422 Len=0 TSval=768312118 TSecr=121435330
 198659 57.255972   52.202.168.18         100.98.0.2            TCP      66     443β†’59084 [ACK] Seq=1912 Ack=58783 Win=422 Len=0 TSval=768312118 TSecr=121435334
 198660 57.271252   52.202.168.18         100.98.0.2            TLSv1.2  339    Application Data
 198661 57.310502   100.98.0.2            52.202.168.18         TCP      66     59084β†’443 [ACK] Seq=58783 Ack=2185 Win=1030 Len=0 TSval=121435394 TSecr=768312122
 198662 57.325998   100.98.0.2            52.207.126.249        TCP      2800   [TCP segment of a reassembled PDU]
 198663 57.326031   100.98.0.2            52.207.126.249        TLSv1.2  537    Application Data
 198664 57.330544   52.207.126.249        100.98.0.2            TCP      66     443β†’45826 [ACK] Seq=936 Ack=18762 Win=422 Len=0 TSval=3460458049 TSecr=121435409
 198665 57.331805   52.207.126.249        100.98.0.2            TLSv1.2  253    Application Data
 198666 57.331847   100.98.0.2            52.207.126.249        TCP      66     45826β†’443 [ACK] Seq=19233 Ack=1123 Win=1030 Len=0 TSval=121435415 TSecr=3460458049
 198667 57.508473   9a:61:c8:96:90:d6     a2:a9:d4:36:63:da     ARP      42     Who has 100.117.128.3?  Tell 100.98.0.7
 198668 57.509235   a2:a9:d4:36:63:da     9a:61:c8:96:90:d6     ARP      42     100.117.128.3 is at a2:a9:d4:36:63:da
 198669 57.537663   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198670 57.538279   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=487095 Ack=1495 Win=247 Len=0 TSval=135870617 TSecr=121435621
 198671 57.540358   100.64.0.1            100.98.0.2            TLSv1.2  130    Application Data
 198672 57.540482   100.64.0.1            100.98.0.2            TLSv1.2  16479  Application Data
 198673 57.540555   100.64.0.1            100.98.0.2            TLSv1.2  13599  Application Data, Application Data
 198674 57.540570   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198675 57.540568   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1541 Ack=503572 Win=12271 Len=0 TSval=121435624 TSecr=135870619
 198676 57.540613   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1541 Ack=517143 Win=12271 Len=0 TSval=121435624 TSecr=135870619
 198677 57.540692   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198678 57.540727   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198679 57.540758   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198680 57.541267   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=517143 Ack=1621 Win=247 Len=0 TSval=135870620 TSecr=121435624
 198681 57.541285   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198682 57.543650   100.64.0.1            100.98.0.2            TLSv1.2  107    Application Data
 198683 57.543815   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198684 57.543842   100.64.0.1            100.98.0.2            TLSv1.2  6306   Application Data
 198685 57.543830   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1713 Ack=526044 Win=12271 Len=0 TSval=121435627 TSecr=135870622
 198686 57.543889   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198687 57.543921   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1713 Ack=532322 Win=12271 Len=0 TSval=121435627 TSecr=135870622
 198688 57.543973   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198689 57.544007   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198690 57.544222   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198691 57.544506   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=532322 Ack=1751 Win=247 Len=0 TSval=135870623 TSecr=121435627
 198692 57.554597   100.64.0.1            100.98.0.2            TLSv1.2  107    Application Data
 198693 57.554720   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198694 57.554747   100.64.0.1            100.98.0.2            TLSv1.2  7619   Application Data
 198695 57.554754   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198696 57.554735   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1843 Ack=541223 Win=12271 Len=0 TSval=121435638 TSecr=135870633
 198697 57.554799   100.64.0.1            100.98.0.2            TLSv1.2  16479  Application Data
 198698 57.554809   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198699 57.554807   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1843 Ack=548814 Win=12271 Len=0 TSval=121435638 TSecr=135870633
 198700 57.554876   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1843 Ack=565265 Win=12271 Len=0 TSval=121435638 TSecr=135870633
 198701 57.554946   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198702 57.554971   100.64.0.1            100.98.0.2            TLSv1.2  17786  Application Data, Application Data
 198703 57.554979   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198704 57.555006   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198705 57.555481   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198706 57.555519   100.64.0.1            100.98.0.2            TLSv1.2  26646  Application Data
 198707 57.555553   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198708 57.555550   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=1969 Ack=618425 Win=12250 Len=0 TSval=121435639 TSecr=135870634
 198709 57.555568   100.64.0.1            100.98.0.2            TLSv1.2  17786  Application Data
 198710 57.555585   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=645005 Ack=1923 Win=247 Len=0 TSval=135870634 TSecr=121435638
 198711 57.555593   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198712 57.555651   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198713 57.555716   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198714 57.556181   100.64.0.1            100.98.0.2            TLSv1.2  8926   Application Data
 198715 57.556209   100.64.0.1            100.98.0.2            TLSv1.2  17786  Application Data
 198716 57.556226   100.64.0.1            100.98.0.2            TLSv1.2  17786  Application Data
 198717 57.556242   100.64.0.1            100.98.0.2            TLSv1.2  8926   Application Data
 198718 57.556242   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=2095 Ack=671585 Win=12250 Len=0 TSval=121435639 TSecr=135870634
 198719 57.556253   100.64.0.1            100.98.0.2            TCP      8926   [TCP segment of a reassembled PDU]
 198720 57.556259   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=707025 Ack=2049 Win=247 Len=0 TSval=135870635 TSecr=121435639
 198721 57.556297   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=2095 Ack=707025 Win=12238 Len=0 TSval=121435639 TSecr=135870634
 198722 57.556377   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198723 57.556483   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198724 57.556849   100.64.0.1            100.98.0.2            TLSv1.2  17786  Application Data
 198725 57.556878   100.64.0.1            100.98.0.2            TLSv1.2  5963   Application Data
 198726 57.556894   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=2179 Ack=730642 Win=12254 Len=0 TSval=121435640 TSecr=135870635
 198727 57.556942   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=730642 Ack=2133 Win=247 Len=0 TSval=135870635 TSecr=121435639
 198728 57.556987   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198729 57.597224   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=730642 Ack=2175 Win=247 Len=0 TSval=135870676 TSecr=121435640
 198730 57.696377   100.98.0.0            100.98.0.8            TCP      74     37916β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121435779 TSecr=0 WS=512
 198731 57.696422   100.98.0.8            100.98.0.0            TCP      74     8080β†’37916 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121435780 TSecr=121435779 WS=512
 198732 57.696450   100.98.0.0            100.98.0.8            TCP      66     37916β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121435780 TSecr=121435780
 198733 57.696569   100.98.0.0            100.98.0.8            HTTP     195    GET /ptpr/api/heartbeat HTTP/1.1 
 198734 57.696581   100.98.0.8            100.98.0.0            TCP      66     8080β†’37916 [ACK] Seq=1 Ack=130 Win=28160 Len=0 TSval=121435780 TSecr=121435780
 198735 57.696967   100.98.0.8            100.117.128.13        MEMCACHE 79     get ping-n2 
 198736 57.697861   100.117.128.13        100.98.0.8            MEMCACHE 71     END 
 198737 57.697890   100.98.0.8            100.117.128.13        TCP      66     47486β†’11211 [ACK] Seq=471769 Ack=10696 Win=52 Len=0 TSval=121435781 TSecr=132230898
 198738 57.698340   100.98.0.8            172.25.81.195         PGSQL    110    >P/B/D/E/S
 198739 57.698680   172.25.81.195         100.98.0.8            PGSQL    92     <1/2/n/I/Z
 198740 57.698711   100.98.0.8            172.25.81.195         TCP      66     36356β†’5432 [ACK] Seq=133175 Ack=26844 Win=2384 Len=0 TSval=121435782 TSecr=409147725
 198741 57.698780   100.98.0.8            172.25.81.195         PGSQL    172    >B/E/S
 198742 57.699224   172.25.81.195         100.98.0.8            PGSQL    91     <2/C/Z
 198743 57.700341   100.98.0.8            100.98.0.0            HTTP     378    HTTP/1.1 200   (text/plain)
 198744 57.700359   100.98.0.0            100.98.0.8            TCP      66     37916β†’8080 [ACK] Seq=130 Ack=313 Win=28160 Len=0 TSval=121435783 TSecr=121435783
 198745 57.700476   100.98.0.8            100.98.0.0            TCP      66     8080β†’37916 [FIN, ACK] Seq=313 Ack=130 Win=28160 Len=0 TSval=121435784 TSecr=121435783
 198746 57.700560   100.98.0.8            100.117.128.13        MEMCACHE 427    set 97C9FAB1F8419D9BF4F3B2EB01D00DDA-n2 2048 10794 303 
 198747 57.700711   100.98.0.0            100.98.0.8            TCP      66     37916β†’8080 [FIN, ACK] Seq=130 Ack=314 Win=28160 Len=0 TSval=121435784 TSecr=121435784
 198748 57.700724   100.98.0.8            100.98.0.0            TCP      66     8080β†’37916 [ACK] Seq=314 Ack=131 Win=28160 Len=0 TSval=121435784 TSecr=121435784
 198749 57.701242   100.117.128.13        100.98.0.8            MEMCACHE 74     STORED 
 198750 57.738489   100.98.0.8            172.25.81.195         TCP      66     36356β†’5432 [ACK] Seq=133281 Ack=26869 Win=2384 Len=0 TSval=121435822 TSecr=409147725
 198751 57.740486   100.98.0.8            100.117.128.13        TCP      66     47486β†’11211 [ACK] Seq=472130 Ack=10704 Win=52 Len=0 TSval=121435824 TSecr=132230901
 198752 57.983439   100.64.0.1            100.98.0.7            TLSv1.2  522    Application Data
 198753 57.983498   100.98.0.7            172.25.19.81          TCP      66     34358β†’443 [ACK] Seq=211 Ack=26497 Win=2072 Len=0 TSval=121436067 TSecr=135871062
 198754 58.075324   100.98.0.0            100.98.0.8            TCP      74     37918β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121436158 TSecr=0 WS=512
 198755 58.075358   100.98.0.8            100.98.0.0            TCP      74     8080β†’37918 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121436158 TSecr=121436158 WS=512
 198756 58.075382   100.98.0.0            100.98.0.8            TCP      66     37918β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121436158 TSecr=121436158
 198757 58.075525   100.98.0.0            100.98.0.8            HTTP     195    GET /ptpr/api/heartbeat HTTP/1.1 
 198758 58.075537   100.98.0.8            100.98.0.0            TCP      66     8080β†’37918 [ACK] Seq=1 Ack=130 Win=28160 Len=0 TSval=121436159 TSecr=121436159
 198759 58.076178   100.98.0.8            172.25.81.195         PGSQL    110    >P/B/D/E/S
 198760 58.076459   172.25.81.195         100.98.0.8            PGSQL    92     <1/2/n/I/Z
 198761 58.076497   100.98.0.8            172.25.81.195         TCP      66     36356β†’5432 [ACK] Seq=133325 Ack=26895 Win=2384 Len=0 TSval=121436159 TSecr=409147819
 198762 58.076584   100.98.0.8            172.25.81.195         PGSQL    172    >B/E/S
 198763 58.077023   172.25.81.195         100.98.0.8            PGSQL    91     <2/C/Z
 198764 58.078135   100.98.0.8            100.98.0.0            HTTP     378    HTTP/1.1 200   (text/plain)
 198765 58.078157   100.98.0.0            100.98.0.8            TCP      66     37918β†’8080 [ACK] Seq=130 Ack=313 Win=28160 Len=0 TSval=121436161 TSecr=121436161
 198766 58.078258   100.98.0.8            100.98.0.0            TCP      66     8080β†’37918 [FIN, ACK] Seq=313 Ack=130 Win=28160 Len=0 TSval=121436161 TSecr=121436161
 198767 58.078554   100.98.0.8            100.117.128.13        MEMCACHE 427    set 3FD594ED82AE4D92E6E260408E2A1C70-n2 2048 10794 303 
 198768 58.078578   100.98.0.0            100.98.0.8            TCP      66     37918β†’8080 [FIN, ACK] Seq=130 Ack=314 Win=28160 Len=0 TSval=121436162 TSecr=121436161
 198769 58.078590   100.98.0.8            100.98.0.0            TCP      66     8080β†’37918 [ACK] Seq=314 Ack=131 Win=28160 Len=0 TSval=121436162 TSecr=121436162
 198770 58.079337   100.117.128.13        100.98.0.8            MEMCACHE 74     STORED 
 198771 58.079360   100.98.0.8            100.117.128.13        TCP      66     47486β†’11211 [ACK] Seq=472491 Ack=10712 Win=52 Len=0 TSval=121436162 TSecr=132231279
 198772 58.110401   100.98.0.8            100.103.128.12        DNS      98     Standard query 0x962f  A configserver.default.svc.cluster.local
 198773 58.110435   100.98.0.8            100.103.128.12        DNS      98     Standard query 0x962f  A configserver.default.svc.cluster.local
 198774 58.111231   100.103.128.12        100.98.0.8            DNS      114    Standard query response 0x962f  A 100.69.199.108
 198775 58.111232   100.103.128.12        100.98.0.8            DNS      114    Standard query response 0x962f  A 100.69.199.108
 198776 58.111406   100.98.0.8            100.99.128.9          TCP      74     60214β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121436194 TSecr=0 WS=512
 198777 58.111430   100.98.0.8            100.99.128.9          TCP      74     [TCP Out-Of-Order] 60214β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121436194 TSecr=0 WS=512
 198778 58.112642   100.99.128.9          100.98.0.8            TCP      74     8080β†’60214 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=124898065 TSecr=121436194 WS=512
 198779 58.112643   100.99.128.9          100.98.0.8            TCP      74     [TCP Out-Of-Order] 8080β†’60214 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=124898065 TSecr=121436194 WS=512
 198780 58.112673   100.98.0.8            100.99.128.9          TCP      66     60214β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121436196 TSecr=124898065
 198781 58.112686   100.98.0.8            100.99.128.9          TCP      66     [TCP Dup ACK 198780#1] 60214β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121436196 TSecr=124898065
 198782 58.112677   100.98.0.8            100.99.128.9          TCP      66     [TCP Dup ACK 198780#2] 60214β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121436196 TSecr=124898065
 198783 58.112693   100.98.0.8            100.99.128.9          TCP      66     [TCP Dup ACK 198780#3] 60214β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121436196 TSecr=124898065
 198784 58.113027   100.98.0.8            100.99.128.9          HTTP     287    GET /ptplace-default.properties HTTP/1.1 
 198785 58.113044   100.98.0.8            100.99.128.9          HTTP     287    [TCP Retransmission] GET /ptplace-default.properties HTTP/1.1 
 198786 58.113995   100.99.128.9          100.98.0.8            TCP      66     8080β†’60214 [ACK] Seq=1 Ack=222 Win=28160 Len=0 TSval=124898067 TSecr=121436196
 198787 58.113996   100.99.128.9          100.98.0.8            TCP      78     [TCP Dup ACK 198786#1] 8080β†’60214 [ACK] Seq=1 Ack=222 Win=28160 Len=0 TSval=124898067 TSecr=121436196 SLE=1 SRE=222
 198788 58.117553   100.98.0.8            172.25.81.195         TCP      66     36356β†’5432 [ACK] Seq=133431 Ack=26920 Win=2384 Len=0 TSval=121436201 TSecr=409147820
 198789 58.122885   a2:74:89:68:f0:a4     Broadcast             ARP      42     Who has 100.98.0.7?  Tell 100.99.128.0
 198790 58.136593   9a:61:c8:96:90:d6     a2:74:89:68:f0:a4     ARP      42     100.98.0.7 is at 9a:61:c8:96:90:d6
 198791 58.137641   100.99.128.9          100.98.0.7            DNS      95     Standard query 0xad39  A gitlab.colinx.com.svc.cluster.local
 198792 58.137992   100.98.0.7            100.99.128.9          DNS      188    Standard query response 0xad39 No such name
 198793 58.138988   100.99.128.9          100.98.0.7            DNS      95     Standard query 0xc63e  AAAA gitlab.colinx.com.svc.cluster.local
 198794 58.139275   100.98.0.7            100.99.128.9          DNS      188    Standard query response 0xc63e No such name
 198795 58.144166   100.99.128.9          100.98.0.7            DNS      88     Standard query 0xf9c8  A gitlab.colinx.com.colinx.com
 198796 58.144254   100.98.0.7            100.99.128.9          DNS      88     Standard query response 0xf9c8 No such name
 198797 58.145273   100.99.128.9          100.98.0.7            DNS      88     Standard query 0x70cd  AAAA gitlab.colinx.com.colinx.com
 198798 58.145336   100.98.0.7            100.99.128.9          DNS      88     Standard query response 0x70cd No such name
 198799 58.146307   100.99.128.9          100.98.0.7            DNS      77     Standard query 0xaf4e  A gitlab.colinx.com
 198800 58.146384   100.98.0.7            100.99.128.9          DNS      132    Standard query response 0xaf4e  CNAME isc-p-gitlab-1.colinx.com A 172.30.10.72
 198801 58.147333   100.99.128.9          100.98.0.7            DNS      77     Standard query 0x5053  AAAA gitlab.colinx.com
 198802 58.147405   100.98.0.7            100.99.128.9          DNS      116    Standard query response 0x5053  CNAME isc-p-gitlab-1.colinx.com
 198803 58.261880   100.98.0.8            69.12.41.164          MQ       386    SPI               (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198804 58.262952   69.12.41.164          100.98.0.8            MQ       386    SPI_REPLY         Hdl=0x00000002 (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198805 58.263019   100.98.0.8            69.12.41.164          TCP      66     52976β†’1414 [ACK] Seq=2329 Ack=2641 Win=23684 Len=0 TSval=121436346 TSecr=1544869
 198806 58.263534   100.98.0.8            69.12.41.164          MQ       894    MQPUT             Hdl=0x00000002[Malformed Packet]
 198807 58.266874   100.98.0.0            100.98.0.5            TCP      74     54070β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121436350 TSecr=0 WS=512
 198808 58.266899   100.98.0.5            100.98.0.0            TCP      74     8080β†’54070 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121436350 TSecr=121436350 WS=512
 198809 58.266915   100.98.0.0            100.98.0.5            TCP      66     54070β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121436350 TSecr=121436350
 198810 58.267041   100.98.0.0            100.98.0.5            HTTP     184    GET /healthz HTTP/1.1 
 198811 58.267050   100.98.0.5            100.98.0.0            TCP      66     8080β†’54070 [ACK] Seq=1 Ack=119 Win=26624 Len=0 TSval=121436350 TSecr=121436350
 198812 58.267308   100.98.0.5            100.98.0.0            HTTP     656    HTTP/1.1 200 OK  (text/plain)
 198813 58.267322   100.98.0.0            100.98.0.5            TCP      66     54070β†’8080 [ACK] Seq=119 Ack=591 Win=28160 Len=0 TSval=121436350 TSecr=121436350
 198814 58.267340   100.98.0.5            100.98.0.0            TCP      66     8080β†’54070 [FIN, ACK] Seq=591 Ack=119 Win=26624 Len=0 TSval=121436350 TSecr=121436350
 198815 58.267431   100.98.0.0            100.98.0.5            TCP      66     54070β†’8080 [FIN, ACK] Seq=119 Ack=592 Win=28160 Len=0 TSval=121436350 TSecr=121436350
 198816 58.267457   100.98.0.5            100.98.0.0            TCP      66     8080β†’54070 [ACK] Seq=592 Ack=120 Win=26624 Len=0 TSval=121436351 TSecr=121436350
 198817 58.303813   69.12.41.164          100.98.0.8            TCP      66     1414β†’52976 [ACK] Seq=2641 Ack=3157 Win=640 Len=0 TSval=1544910 TSecr=121436347
 198818 58.425679   100.99.128.9          100.98.0.8            HTTP     2550   HTTP/1.1 200   (text/plain)
 198819 58.425755   100.98.0.8            100.99.128.9          TCP      66     60214β†’8080 [ACK] Seq=222 Ack=2485 Win=31744 Len=0 TSval=121436509 TSecr=124898378
 198820 58.425778   100.98.0.8            100.99.128.9          TCP      66     [TCP Dup ACK 198819#1] 60214β†’8080 [ACK] Seq=222 Ack=2485 Win=31744 Len=0 TSval=121436509 TSecr=124898378
 198821 58.510120   69.12.41.164          100.98.0.8            MQ       598    MQPUT_REPLY       Hdl=0x00000002 Q=TEST.PING
 198822 58.510548   100.98.0.8            69.12.41.164          MQ       122    MQCLOSE           Hdl=0x00000002
 198823 58.511502   69.12.41.164          100.98.0.8            TCP      66     1414β†’52976 [ACK] Seq=3173 Ack=3213 Win=640 Len=0 TSval=1545117 TSecr=121436594
 198824 58.512060   69.12.41.164          100.98.0.8            MQ       118    MQCLOSE_REPLY    
 198825 58.512660   100.98.0.8            69.12.41.164          MQ       614    SPI               (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198826 58.513720   69.12.41.164          100.98.0.8            MQ       614    SPI_REPLY         Hdl=0x00000002 (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198827 58.514051   100.98.0.8            69.12.41.164          MQ       214    REQUEST_MSGS      Hdl=0x00000002 GlbMsgIdx=7, MaxLen=4096
 198828 58.515074   69.12.41.164          100.98.0.8            MQ       830    ASYNC_MESSAGE     Hdl=0x00000002 GlbMsgIdx=8, SegIdx=0, SegLen=293[Malformed Packet]
 198829 58.515768   100.98.0.8            69.12.41.164          MQ       118    MQCMIT           
 198830 58.519225   69.12.41.164          100.98.0.8            MQ       118    MQCMIT_REPLY     
 198831 58.519491   100.98.0.8            69.12.41.164          MQ       122    MQCLOSE           Hdl=0x00000002
 198832 58.520507   69.12.41.164          100.98.0.8            MQ       118    MQCLOSE_REPLY     Hdl=0x00000002
 198833 58.560500   100.98.0.8            69.12.41.164          TCP      66     52976β†’1414 [ACK] Seq=4017 Ack=4641 Win=23684 Len=0 TSval=121436644 TSecr=1545126
 198834 58.824928   100.98.0.8            69.12.41.164          MQ       386    SPI               (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198835 58.826036   69.12.41.164          100.98.0.8            MQ       386    SPI_REPLY         Hdl=0x00000002 (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198836 58.826070   100.98.0.8            69.12.41.164          TCP      66     33558β†’1414 [ACK] Seq=2329 Ack=2641 Win=23684 Len=0 TSval=121436909 TSecr=1545432
 198837 58.826539   100.98.0.8            69.12.41.164          MQ       894    MQPUT             Hdl=0x00000002[Malformed Packet]
 198838 58.828344   69.12.41.164          100.98.0.8            MQ       598    MQPUT_REPLY       Hdl=0x00000002 Q=TEST.PING
 198839 58.828583   100.98.0.8            69.12.41.164          MQ       122    MQCLOSE           Hdl=0x00000002
 198840 58.829431   69.12.41.164          100.98.0.8            MQ       118    MQCLOSE_REPLY    
 198841 58.829847   100.98.0.8            69.12.41.164          MQ       614    SPI               (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198842 58.830894   69.12.41.164          100.98.0.8            MQ       614    SPI_REPLY         Hdl=0x00000002 (OPEN) Typ=MQOT_Q Obj=TEST.PING
 198843 58.831190   100.98.0.8            69.12.41.164          MQ       214    REQUEST_MSGS      Hdl=0x00000002 GlbMsgIdx=7, MaxLen=4096
 198844 58.832213   69.12.41.164          100.98.0.8            MQ       830    ASYNC_MESSAGE     Hdl=0x00000002 GlbMsgIdx=8, SegIdx=0, SegLen=293[Malformed Packet]
 198845 58.832872   100.98.0.8            69.12.41.164          MQ       118    MQCMIT           
 198846 58.834959   69.12.41.164          100.98.0.8            MQ       118    MQCMIT_REPLY     
 198847 58.835272   100.98.0.8            69.12.41.164          MQ       122    MQCLOSE           Hdl=0x00000002
 198848 58.836643   69.12.41.164          100.98.0.8            MQ       118    MQCLOSE_REPLY     Hdl=0x00000002
 198849 58.876495   100.98.0.8            69.12.41.164          TCP      66     33558β†’1414 [ACK] Seq=4017 Ack=4641 Win=23684 Len=0 TSval=121436960 TSecr=1545442
 198850 59.025558   100.64.0.1            100.98.0.7            TLSv1.2  540    Application Data
 198851 59.025583   100.98.0.7            172.25.19.81          TCP      66     34358β†’443 [ACK] Seq=211 Ack=26971 Win=2072 Len=0 TSval=121437109 TSecr=135872104
 198852 59.025726   100.98.0.7            172.25.19.81          TLSv1.2  108    Application Data
 198853 59.026226   100.64.0.1            100.98.0.7            TCP      66     443β†’34358 [ACK] Seq=27445 Ack=253 Win=58 Len=0 TSval=135872105 TSecr=121437109
 198854 59.090612   6e:3f:90:e5:38:2e     Broadcast             ARP      42     Who has 100.101.0.9?  Tell 100.117.128.10
 198855 59.995730   100.64.0.1            100.98.0.7            TLSv1.2  522    Application Data
 198856 60.002399   100.98.0.0            100.98.0.6            TCP      74     48704β†’3000 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121438085 TSecr=0 WS=512
 198857 60.002444   100.98.0.6            100.98.0.0            TCP      74     3000β†’48704 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121438086 TSecr=121438085 WS=512
 198858 60.002465   100.98.0.0            100.98.0.6            TCP      66     48704β†’3000 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121438086 TSecr=121438086
 198859 60.002570   100.98.0.0            100.98.0.6            HTTP     186    GET /heartbeat HTTP/1.1 
 198860 60.002579   100.98.0.6            100.98.0.0            TCP      66     3000β†’48704 [ACK] Seq=1 Ack=121 Win=26624 Len=0 TSval=121438086 TSecr=121438086
 198861 60.003504   100.98.0.6            100.98.0.0            HTTP     261    HTTP/1.1 200 OK  (text/html)
 198862 60.003524   100.98.0.0            100.98.0.6            TCP      66     48704β†’3000 [ACK] Seq=121 Ack=196 Win=28160 Len=0 TSval=121438087 TSecr=121438087
 198863 60.003622   100.98.0.0            100.98.0.6            TCP      66     48704β†’3000 [FIN, ACK] Seq=121 Ack=196 Win=28160 Len=0 TSval=121438087 TSecr=121438087
 198864 60.003833   100.98.0.6            100.98.0.0            TCP      66     3000β†’48704 [FIN, ACK] Seq=196 Ack=122 Win=26624 Len=0 TSval=121438087 TSecr=121438087
 198865 60.003848   100.98.0.0            100.98.0.6            TCP      66     48704β†’3000 [ACK] Seq=122 Ack=197 Win=28160 Len=0 TSval=121438087 TSecr=121438087
 198866 60.035524   100.98.0.7            172.25.19.81          TCP      66     34358β†’443 [ACK] Seq=253 Ack=27427 Win=2072 Len=0 TSval=121438119 TSecr=135873074
 198867 60.044211   100.98.0.2            172.25.19.81          TLSv1.2  112    Application Data
 198868 60.044777   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=730642 Ack=2221 Win=247 Len=0 TSval=135873123 TSecr=121438127
 198869 60.046889   100.64.0.1            100.98.0.2            TLSv1.2  131    Application Data
 198870 60.047052   100.64.0.1            100.98.0.2            TLSv1.2  15166  Application Data
 198871 60.047103   100.64.0.1            100.98.0.2            TLSv1.2  104    Application Data
 198872 60.047158   100.98.0.2            172.25.19.81          TCP      66     53128β†’443 [ACK] Seq=2267 Ack=745845 Win=12271 Len=0 TSval=121438130 TSecr=135873125
 198873 60.047238   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198874 60.047281   100.98.0.2            172.25.19.81          TLSv1.2  108    Application Data
 198875 60.047771   100.64.0.1            100.98.0.2            TCP      66     443β†’53128 [ACK] Seq=745845 Ack=2305 Win=247 Len=0 TSval=135873126 TSecr=121438130
 198876 60.091873   6e:3f:90:e5:38:2e     Broadcast             ARP      42     Who has 100.101.0.9?  Tell 100.117.128.10
 198877 60.113066   66:f7:68:a3:bd:be     76:f5:6d:d6:fe:ff     ARP      42     Who has 100.98.0.8?  Tell 100.103.128.12
 198878 60.113088   76:f5:6d:d6:fe:ff     66:f7:68:a3:bd:be     ARP      42     100.98.0.8 is at 76:f5:6d:d6:fe:ff
 198879 60.116501   be:f3:86:69:94:76     66:f7:68:a3:bd:be     ARP      42     Who has 100.103.128.12?  Tell 100.98.0.0
 198880 60.117064   66:f7:68:a3:bd:be     be:f3:86:69:94:76     ARP      42     100.103.128.12 is at 66:f7:68:a3:bd:be
 198881 60.118635   46:9f:79:8c:75:bc     76:f5:6d:d6:fe:ff     ARP      42     Who has 100.98.0.8?  Tell 100.99.128.9
 198882 60.118650   76:f5:6d:d6:fe:ff     46:9f:79:8c:75:bc     ARP      42     100.98.0.8 is at 76:f5:6d:d6:fe:ff
 198883 60.140489   9a:61:c8:96:90:d6     46:9f:79:8c:75:bc     ARP      42     Who has 100.99.128.9?  Tell 100.98.0.7
 198884 60.141544   46:9f:79:8c:75:bc     9a:61:c8:96:90:d6     ARP      42     100.99.128.9 is at 46:9f:79:8c:75:bc
 198885 60.336721   100.98.0.4            100.103.128.12        DNS      108    Standard query 0xf3da  A ptplace-bff.default.svc.cluster.local.colinx.com
 198886 60.336746   100.98.0.4            100.103.128.12        DNS      108    Standard query 0xf3da  A ptplace-bff.default.svc.cluster.local.colinx.com
 198887 60.336810   100.98.0.4            100.103.128.12        DNS      108    Standard query 0x4753  AAAA ptplace-bff.default.svc.cluster.local.colinx.com
 198888 60.336822   100.98.0.4            100.103.128.12        DNS      108    Standard query 0x4753  AAAA ptplace-bff.default.svc.cluster.local.colinx.com
 198889 60.337660   100.103.128.12        100.98.0.4            DNS      108    Standard query response 0xf3da No such name
 198890 60.337661   100.103.128.12        100.98.0.4            DNS      108    Standard query response 0xf3da No such name
 198891 60.337699   100.103.128.12        100.98.0.4            DNS      108    Standard query response 0x4753 No such name
 198892 60.337700   100.103.128.12        100.98.0.4            DNS      108    Standard query response 0x4753 No such name
 198893 60.337832   100.98.0.4            100.103.128.12        DNS      97     Standard query 0xa666  A ptplace-bff.default.svc.cluster.local
 198894 60.337852   100.98.0.4            100.103.128.12        DNS      97     Standard query 0xa666  A ptplace-bff.default.svc.cluster.local
 198895 60.337902   100.98.0.4            100.103.128.12        DNS      97     Standard query 0x8a85  AAAA ptplace-bff.default.svc.cluster.local
 198896 60.337913   100.98.0.4            100.103.128.12        DNS      97     Standard query 0x8a85  AAAA ptplace-bff.default.svc.cluster.local
 198897 60.338937   100.103.128.12        100.98.0.4            DNS      113    Standard query response 0xa666  A 100.69.135.10
 198898 60.338960   100.103.128.12        100.98.0.4            DNS      151    Standard query response 0x8a85 
 198899 60.841740   100.98.0.4            100.98.0.8            TCP      74     49678β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121438925 TSecr=0 WS=512
 198900 60.841804   100.98.0.4            100.98.0.8            TCP      74     [TCP Out-Of-Order] 49678β†’8080 [SYN] Seq=0 Win=26616 Len=0 MSS=8872 SACK_PERM=1 TSval=121438925 TSecr=0 WS=512
 198901 60.841828   100.98.0.8            100.98.0.4            TCP      74     8080β†’49678 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121438925 TSecr=121438925 WS=512
 198902 60.841834   100.98.0.8            100.98.0.4            TCP      74     [TCP Out-Of-Order] 8080β†’49678 [SYN, ACK] Seq=0 Ack=1 Win=26580 Len=0 MSS=8872 SACK_PERM=1 TSval=121438925 TSecr=121438925 WS=512
 198903 60.841864   100.98.0.4            100.98.0.8            TCP      66     49678β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121438925 TSecr=121438925
 198904 60.841878   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198903#1] 49678β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121438925 TSecr=121438925
 198905 60.841867   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198903#2] 49678β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121438925 TSecr=121438925
 198906 60.841890   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198903#3] 49678β†’8080 [ACK] Seq=1 Ack=1 Win=26624 Len=0 TSval=121438925 TSecr=121438925
 198907 60.841961   100.98.0.4            100.98.0.8            HTTP     216    GET /ptpr/api/labels?tag=home&tag=layout&tag=addresses HTTP/1.1 
 198908 60.841972   100.98.0.4            100.98.0.8            HTTP     216    [TCP Retransmission] GET /ptpr/api/labels?tag=home&tag=layout&tag=addresses HTTP/1.1 
 198909 60.841979   100.98.0.8            100.98.0.4            TCP      66     8080β†’49678 [ACK] Seq=1 Ack=151 Win=28160 Len=0 TSval=121438925 TSecr=121438925
 198910 60.841983   100.98.0.8            100.98.0.4            TCP      78     [TCP Dup ACK 198909#1] 8080β†’49678 [ACK] Seq=1 Ack=151 Win=28160 Len=0 TSval=121438925 TSecr=121438925 SLE=1 SRE=151
 198911 60.842403   100.98.0.8            100.99.128.12         MEMCACHE 79     get ping-n1 
 198912 60.843595   100.99.128.12         100.98.0.8            MEMCACHE 71     END 
 198913 60.843655   100.98.0.8            100.99.128.12         TCP      66     57650β†’11211 [ACK] Seq=513645 Ack=11624 Win=52 Len=0 TSval=121438927 TSecr=124900797
 198914 60.844204   100.98.0.8            172.25.81.195         PGSQL    110    >P/B/D/E/S
 198915 60.844544   172.25.81.195         100.98.0.8            PGSQL    92     <1/2/n/I/Z
 198916 60.844574   100.98.0.8            172.25.81.195         TCP      66     36356β†’5432 [ACK] Seq=133475 Ack=26946 Win=2384 Len=0 TSval=121438928 TSecr=409148511
 198917 60.844652   100.98.0.8            172.25.81.195         PGSQL    280    >B/E/S
 198918 60.845207   172.25.81.195         100.98.0.8            PGSQL    91     <2/C/Z
 198919 60.846457   100.98.0.8            100.98.0.4            TCP      8258   [TCP segment of a reassembled PDU]
 198920 60.846488   100.98.0.4            100.98.0.8            TCP      66     49678β†’8080 [ACK] Seq=151 Ack=8193 Win=44544 Len=0 TSval=121438929 TSecr=121438929
 198921 60.846498   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198920#1] 49678β†’8080 [ACK] Seq=151 Ack=8193 Win=44544 Len=0 TSval=121438929 TSecr=121438929
 198922 60.846536   100.98.0.8            100.98.0.4            TCP      419    [TCP segment of a reassembled PDU]
 198923 60.846568   100.98.0.4            100.98.0.8            TCP      66     49678β†’8080 [ACK] Seq=151 Ack=8546 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198924 60.846580   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198923#1] 49678β†’8080 [ACK] Seq=151 Ack=8546 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198925 60.846722   100.98.0.8            100.98.0.4            HTTP     71     HTTP/1.1 200   (application/json)
 198926 60.846948   100.98.0.4            100.98.0.8            TCP      66     49678β†’8080 [ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198927 60.846960   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198926#1] 49678β†’8080 [ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198928 60.846954   100.98.0.8            100.99.128.12         MEMCACHE 427    set 521DF64249CA28FE982DCB06C1A2DA8E-n1 2048 10794 303 
 198929 60.847073   100.98.0.4            100.98.0.8            TCP      66     49678β†’8080 [FIN, ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198930 60.847084   100.98.0.4            100.98.0.8            TCP      66     [TCP Out-Of-Order] 49678β†’8080 [FIN, ACK] Seq=151 Ack=8551 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198931 60.847104   100.98.0.8            100.98.0.4            TCP      78     8080β†’49678 [ACK] Seq=8551 Ack=152 Win=28160 Len=0 TSval=121438930 TSecr=121438930 SLE=151 SRE=152
 198932 60.847180   100.98.0.8            100.98.0.4            TCP      66     8080β†’49678 [FIN, ACK] Seq=8551 Ack=152 Win=28160 Len=0 TSval=121438930 TSecr=121438930
 198933 60.847200   100.98.0.4            100.98.0.8            TCP      66     49678β†’8080 [ACK] Seq=152 Ack=8552 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198934 60.847209   100.98.0.4            100.98.0.8            TCP      66     [TCP Dup ACK 198933#1] 49678β†’8080 [ACK] Seq=152 Ack=8552 Win=60928 Len=0 TSval=121438930 TSecr=121438930
 198935 60.847951   100.99.128.12         100.98.0.8            MEMCACHE 74     STORED 
 198936 60.852416   100.98.0.4            100.98.0.7            DNS      123    Standard query 0xad91  A ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198937 60.852439   100.98.0.4            100.98.0.7            DNS      123    Standard query 0xad91  A ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198938 60.852478   100.98.0.4            100.98.0.7            DNS      123    Standard query 0xed16  AAAA ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198939 60.852486   100.98.0.4            100.98.0.7            DNS      123    Standard query 0xed16  AAAA ptplace-bff.default.svc.cluster.local.default.svc.cluster.local
 198940 60.852783   100.98.0.7            100.98.0.4            DNS      216    Standard query response 0xad91 No such name
 198941 60.852905   100.98.0.7            100.98.0.4            DNS      216    Standard query response 0xed16 No such name
 198942 60.853035   100.98.0.4            100.98.0.7            DNS      115    Standard query 0x73ab  A ptplace-bff.default.svc.cluster.local.svc.cluster.local
 198943 60.853067   100.98.0.4            100.98.0.7            DNS      115    Standard query 0x73ab  A ptplace-bff.default.svc.cluster.local.svc.cluster.local
 198944 60.853089   100.98.0.4            100.98.0.7            DNS      115    Standard query 0x2247  AAAA ptplace-bff.default.svc.cluster.local.svc.cluster.local
 198945 60.853109   100.98.0.4            100.98.0.7            DNS      115    Standard query 0x2247  AAAA ptplace-bff.default.svc.cluster.local.svc.cluster.local
brb commented

Thanks, but I still need the two iptables-save outputs. Also, can you please attach the foo.pcap file.

Does the client pod run on the same host as the target DNS server?

@brb will do. Those iptables save outputs are from within the weave container on the host running the client pod, correct? I can arrange for the client to be on teh same pod as the target DNS or not, which would you prefer?

brb commented

On the same host - the client and the DNS. Thanks!

@brb roger that!

@brb Phew! ok that's a hard test to coordinate. Fortunately, the pcap is small, the very first lookup timed out.

Test conditions: on a pod with a dns pod on the same test, ran iptables-save -c before and after, and got a pcap during the test.

the test was a curl request that looked like this:

curl -o /dev/null -s -w "#%{time_total}" "http://ptplace-bff.default.svc.cluster.local/ptpr/api/labels?tag=home&tag=layout&tag=addresses"

Here are the ips of all the pods on this host at the time of the test:

dcowden@ubuntu:~/gitwork$ kc get po -o wide --all-namespaces | grep 238

default       dc-debug-856bf6cd69-lqb8p                               1/1       Running   0          11h       100.99.128.5     ip-172-25-51-238.ec2.internal
default       echoheaders-vjdb6                                       1/1       Running   0          11h       100.99.128.6     ip-172-25-51-238.ec2.internal
default       external-comet-6f497c7955-t6k2c                         1/1       Running   0          57m       100.99.128.10    ip-172-25-51-238.ec2.internal
default       ptp-react-57d65c788b-8ssts                              1/1       Running   0          11h       100.99.128.7     ip-172-25-51-238.ec2.internal
default       schaeffler-elasticsearch-0                              1/1       Running   0          12h       100.99.128.4     ip-172-25-51-238.ec2.internal
kube-system   kube-dns-7f56f9f8c7-hp7w5                               3/3       Running   0          11h       100.99.128.9     ip-172-25-51-238.ec2.internal
kube-system   kube-proxy-ip-172-25-51-238.ec2.internal                1/1       Running   0          12h       172.25.51.238    ip-172-25-51-238.ec2.internal
kube-system   weave-net-lzds9                                         2/2       Running   1          12h       172.25.51.238    ip-172-25-51-238.ec2.internal
ops           dd-agent-hg2jj                                          1/1       Running   0          12h       100.99.128.2     ip-172-25-51-238.ec2.internal
ops           fluentd-log-collector-d8rn6                             1/1       Running   0          12h       100.99.128.1     ip-172-25-51-238.ec2.internal
ops           ingress-lb-8l2nj                                        1/1       Running   0          12h       172.25.51.238    ip-172-25-51-238.ec2.internal
ops           iperf-server-daemonset-pbtdc                            1/1       Running   0          12h       100.99.128.3     ip-172-25-51-238.ec2.internal

Of course 100.99.128.9 and 100.99.128.5 are the two relevant actors here

The pcap output did see dropped packets, and is attached

[root@ip-172-25-51-238 ~]# tcpdump -i weave -w /tmp/foo.pcap
tcpdump: listening on weave, link-type EN10MB (Ethernet), capture size 262144 bytes
^C8563 packets captured
8632 packets received by filter
69 packets dropped by kernel

iptables-save -c ( from within the weave container on the host ) is attached as 'before-test.txt'
iptables-save -c after the test is after-test.txt

packet capture is attached as foo.pcap.

Here's a listing of the weave container, confirming the weave image id i'm using

root@ip-172-25-51-238 ~]# docker ps | grep weave
69ddf4ad9d2b        brb0/weave-kube@sha256:84010a75a045b66cf79915b0c0bc44dce59692a30dbd6e80b00149301e5e9a4c                                        "/home/weave/launc..."   12 hours ago        Up 12 hours                             k8s_weave_weave-net-lzds9_kube-system_898e6228-4f70-11e8-821b-069cefeb2b84_1

Thanks for offering the detailed help. So awesome!

after-test.txt
before-test.txt
foo.pcap.gz

brb commented

@dcowden Many thanks for the traces!

In your case, DNAT rules installed by kube-proxy are to blame, and the --random-fully flag won't help.

I'm still trying to understand some behavior of the kernel when UDP is combined with iptables-based LB. Going to post more later.

@brb thanks for looking! For now, we resorted to just putting options single-request-retry into our pods. it works, but i'm sure things are way less than optimal with things as they are.

That said, I think this is a problem lots of people have-- it just can be caused by so many things most people assume its not packet loss.

I want to provide a couple of other details about our configuration that may make us different:

(1) we're using weave's encryption feature. In fact, this is why we chose weave. Since we use k8s on aws, this allows us to have a secure cluster very easily.

(2) this cluster was built with kops, using the weave option from within kops

(3) We're using our own base image for this cluster, which is centos ( not the default for kops).

Let me know if there's more information I can provide, and thanks again for looking!

Well ok i have more findings.

I thought that options single_request_reopen fixes the problem, but actually it just makes it less likely.

I also found out( i think), why we see this perhaps more often than most. Our application uses nodejs frontend and java/api backend. In our flow, the frontend makes 3 separate api requests to the backend, and nodejs runs these in parallel. The result is that we nearly always have 3 separate requests going to the dns pod at the same time. I suspect this triggers the DNAT issue.

I'm able to eliminate the issue by going to the pods and putting the service IP of the backend into the hosts files of the frontend. But of course this isnt a solution.

So now i'm back to actually having to fix the root issue

brb commented

@dcowden

Based on the provided traces (https://github.com/weaveworks/weave/files/1975806/foo.pcap.gz), the following is happening:

  1. The glibc resolver uses the same UDP socket for parallel queries (A and AAA). As UDP is a connectionless protocol, calling connect(2) does not send any packet => no entry is created in the conntrack hash table.
  2. The kube-dns service is accessible via VIP which is backed by iptables DNAT rules. The relevant ones from the nat table in your case:
-A KUBE-SERVICES -d 100.64.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
<..>
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-JILKODJ63HVFF6B2
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-LFXGESA25DLV4HVG
<..>
-A KUBE-SEP-JILKODJ63HVFF6B2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 100.117.128.12:53
-A KUBE-SEP-LFXGESA25DLV4HVG -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 100.99.128.9:53
  1. During DNAT translation, the kernel calls the relevant netfilter hooks in the following order:
    1. nf_conntrack_in: creates conntrack hash object, adds it to the unconfirmed entries list.
    2. nf_nat_ipv4_fn: does the translation, updates the conntrack tuple.
    3. nf_conntrack_confirm: confirms the entry, adds it to the hash table.
  2. The two parallel UDP requests (518 and 524 in the pcap) race for the entry confirmation. Additionally, they end up using different DNS endpoints. 518 wins the race, while 524 looses. Due to the latter, insert_failed counter is incremented (check with conntrack -S) and the request is dropped => you get the timeout.

As I mentioned above, the --random-fully flag does not help here, as it's only for SNAT which is not the culprit in your case.

@Quentin-M

As you use the ipvs backend, I'm curious to see your iptables-save output.

brb commented

@dcowden

I suspect this triggers the DNAT issue.

Could you verify this by checking insert_failed counter value with conntrack -S?

@brb Will do, but i will not be able to do it until later this week. That said, I can't imagine that your analysis is wrong-- what do you think is the fix? and/or can you suggest workarounds?

I'm honestly shocked that most of the internet isnt' saying 'well kubernetes is great, but you're going to have packet loss issues'.

We do not use UDP for much other than DNS, so one idea i've been thinking about is to somehow run kube-dns as a daemonset with hostNetwork=true-- thus removing some of the NAT. But i think that'd be hard to do with kops, because kops bundles the kube-dns manifests, and we'd have to override them.

And even so, that'd be a workaround ( albeit it is very reasonable to assume that DNS is the only UDP protocol that would expose this race condition so frequently).

Another workaround, based on your analysis, would be to avoid using a VIP, and instead configure pods to use the individual pods with a round robin cluster ip A records. I'm not sure if that configuration is possible.

@brb huge kudos for figuring that out!

@dcowden when I used to do electronic trading for a living I would find that people live with the most egregious network problems and never think "this is really broken". A tiny minority of people care enough to look at what is really going on.

Also the issue is sensitive to what exact technologies you use - for instance at Weaveworks we write most services in Go so they don't use the glibc resolver.

run kube-dns as a daemonset

so it would be on every host, and could be addressed using the host's own IP? I've seen discussions along those lines; unfortunately changing resolv.conf to point at that IP requires a Kubernetes change.

instead configure pods to use the individual pods with a round robin cluster ip A records.

Not really following this suggestion. AFAIK resolv.conf has to have the IP addresses of servers, not DNS names. If we could get Kubernetes to keep the IP addresses of kube-dns pods static across restarts, that would be plausible, but not currently a feature.

so it would be on every host, and could be addressed using the host's own IP? I've seen discussions along those lines; unfortunately changing resolv.conf to point at that IP requires a Kubernetes change.

We are already using a hack that updates resolv.conf on pod start in our container entry point to add option single_request-reopen we would need to use that in combination with the downward api to inject the host ip. It stinks but it would work maybe?

Not really following this suggestion. AFAIK resolv.conf has to have the IP addresses of servers, not DNS names. If we could get Kubernetes to keep the IP addresses of kube-dns pods static across restarts, that would be plausible, but not currently a feature.

yeah you're right, there would be no way to assign static ips to the pods to make this work.

@brb yes it appears to be the case. below is the output on the same host on which the tests above ran.

/home/weave # conntrack -S
cpu=0   	searched=630288 found=15093196 new=1346365 invalid=34 ignore=647629 delete=1408965 delete_list=1408867 insert=1344752 insert_failed=92 drop=0 early_drop=0 error=0 search_restart=0 
cpu=1   	searched=846871 found=28126666 new=1919780 invalid=74 ignore=650870 delete=1855000 delete_list=1854877 insert=1921172 insert_failed=107 drop=0 early_drop=0 error=0 search_restart=0 
cpu=2   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=3   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=4   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=5   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=6   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=7   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=8   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=9   	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=10  	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=11  	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=12  	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=13  	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 
cpu=14  	searched=0 found=0 new=0 invalid=0 ignore=0 delete=0 delete_list=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0 

brb commented

@dcowden What is your CentOS and kernel vsn?

@brb

[root@ip-172-25-83-254 ~]# more /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root@ip-172-25-83-254 ~]# uname -a
Linux ip-172-25-83-254.colinx.com 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I would just like to add here that the single-request(-reopen) workaround does not work with Alpine-based containers, as musl does not support the option (see below). Unfortunately, Alpine Linux is the base image for 90% of our infrastructure.

src/network/resolvconf.c

                if (!strncmp(line, "options", 7) && isspace(line[7])) {
                        p = strstr(line, "ndots:");
                        if (p && isdigit(p[6])) {
                                p += 6;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->ndots = x > 15 ? 15 : x;
                        }
                        p = strstr(line, "attempts:");
                        if (p && isdigit(p[9])) {
                                p += 9;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->attempts = x > 10 ? 10 : x;
                        }
                        p = strstr(line, "timeout:");
                        if (p && (isdigit(p[8]) || p[8]=='.')) {
                                p += 8;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->timeout = x > 60 ? 60 : x;
                        }
                        continue;
                }

src/network/lookup.h

struct resolvconf {
        struct address ns[MAXNS];
        unsigned nns, attempts, ndots;
        unsigned timeout;
};

I have reached out on the freenode's #musl channel, but unfortunately it does not seem like there is much desire to add support for the option:

[16:19] <dalias> why not fix the bug causing it?
[16:20] <dalias> sprry
[16:20] <dalias> the option is not something that can be added, its contrary to the lookup architecture
[17:39] <dalias> quentinm, thanks for the report. i just don't know any good way to work around it on our side without nasty hacks
[17:40] <dalias> the architecture is not designed to support sequential queries

@dcowden @bboreham @brb @dcowden @xiaoxubeii

For what it's worth: I simply switched a two-nodes cluster that was broken (5s latency for every single curl, except when single-request was used), from the latest weave to calico 2.6, and the issue went away immediately. None of my pods experience the DNS issue where AAAA packets would get dropped anymore.

I will be happy to grant access to a cluster where the issue is present if that means we will get some help πŸ’―

@Quentin-M thanks for the report. We'll try this next. For now we're working around but-- annoying to say the least! Our problem is that calico doesnt support encryption on the cluster overlay. weave does this better than any of the others, so i hope we can keep using weave!

@dcowden @bboreham @brb @dcowden @xiaoxubeii

Another very interesting note, when FASTDP is disabled (but encryption is still on), the issue also disappear. I tested this on 4 clusters, with regular and jumbo MTUs.

How exactly did you disable fastdp?

brb commented

Another very interesting note, when FASTDP is disabled

My guess is that due to slower nature of the sleeve mode races are less likely to happen, but not completely unavoidable.

brb commented

@Quentin-M

For what it's worth: I simply switched a two-nodes cluster that was broken (5s latency for every single curl, except when single-request was used), from the latest weave to calico 2.6, and the issue went away immediately.

That's interesting. Do you use the IP-in-IP tunneling with Calico?

@brb @bboreham

How exactly did you disable fastdp?

Once, I simply dropped the following in the Weave's manifest, used reset and let Kubernetes do a roll Weave. Later, I did the same thing but also killed all the pods. And another time, I edited the manifest, then killed all the nodes, letting new identical ones come back, with fresh configuration/networking, re-scheduling all the pods. Every time, I verified using weave --local status connections.

        - name: WEAVE_NO_FASTDP
          value: "true"

That's interesting. Do you use the IP-in-IP tunneling with Calico?

Yes, IP-in-IP set to always. Happy to drop the manifest if necessary.

My guess is that due to slower nature of the sleeve mode races are less likely to happen, but not completely unavoidable.

That was one of my ideas too, yeah.. Calico is supposedly "pretty fast" as well, even in IPIP (I believe it is done in the kernel too), but the timing might be just different enough to avoid it. Or, the problem is different.

Thank you.

When a single pod is used to wget/curl a target, a tc policy that delay every other DNS datagram by, say, 10ms seems to alleviate the issue entirely: netem gap 2 delay 10ms reorder 100%. However, this may not work much when multiple pods are making requests as the policy applies to the whole node and therefore may not induce delay between the two parallel A/AAAA datagrans coming out of a single pod, but between two A requests of different pods. This actually may not be true and work properly depending on how SNAT/DNAT/conntrack operates, but I am not expert enough.

Another interesting rule is to add random delay to every single DNS datagrams going out, but this does not work 100% of the time, even with a single pod making requests, as the two A/AAA datagrams may be sent with delays that are close enough to each other that the race still happens. There might be a smart thing to do here to make it work reliably.. Maybe rate control.

The traffic shaping may be applied to DNS requests only using filters, but due to the low-level nature of the issue, the drops may also happen to any of traffic on the network.. We are for example about to migrate major graphite/statsd clusters, that sent a high volume of UDP datagrams, and I am worried the issue will also occur there, but become much more problematic, especially as the datagrams will have to be shaped on the ingress side.

Here is the workaround we are about to start using: https://github.com/Quentin-M/weave-tc/blob/master/weave-tc.sh, which seem to reduce the likelihood of the race significantly. Using it is as simple as adding the following container to the weave DaemonSet:

        - name: weave-tc
          image: 'qmachu/weave-tc:0.0.1'
          securityContext:
            privileged: true
          volumeMounts:
            - name: xtables-lock
              mountPath: /run/xtables.lock
            - name: lib-tc
              mountPath: /lib/tc

Is there really nothing that the weave team can do

What we're doing is gathering data to understand the issue(s) and analyzing it. Sorry if this comes across as "nothing".

@Quentin-M holy cow, man we will try that solution out and see if it works for us. What side affects should we watch out for?

It's been a long time since I have read a shell script that was so far over my head.. that's some highly impressive work!

@Quentin-M I am getting No distribution data for pareto (/lib/tc//pareto.dist: No such file or directory) does the host need to have something installed as well? What should lib-tcpoint to on the host? Maybe you can provide your deployment set yaml for me to compare :)

@thomaschaaf Absolutely!

I mount /run/xtables.lock and /lib/tc.
Pareto should already be on the host, it is part of iproute2, which is essentially the same everywhere.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: weave-net
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: system:weave-net
  namespace: kube-system
rules:
  - apiGroups:
      - ''
    resources:
      - pods
      - namespaces
      - nodes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - networking.k8s.io
    resources:
      - networkpolicies
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: system:weave-net
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:weave-net
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: weave-net
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: system:weave-net
  namespace: kube-system
rules:
  - apiGroups:
      - ''
    resourceNames:
      - weave-net
    resources:
      - configmaps
    verbs:
      - get
      - update
  - apiGroups:
      - ''
    resources:
      - configmaps
    verbs:
      - create
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: system:weave-net
  namespace: kube-system
roleRef:
  kind: Role
  name: system:weave-net
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: weave-net
    namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: weave-net
  namespace: kube-system
  labels:
    k8s-app: weave-net
spec:
  selector:
    matchLabels:
      k8s-app: weave-net
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: weave-net
    spec:
      containers:
        - name: weave
          command:
            - /home/weave/launch.sh
          env:
            - name: WEAVE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: weave-password
                  key: password
            - name: WEAVE_MTU
              value: '8912'
            - name: IPALLOC_RANGE
              value: '172.16.0.0/16'
            - name: HOSTNAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          image: 'weaveworks/weave-kube:2.3.0'
          livenessProbe:
            httpGet:
              host: 127.0.0.1
              path: /status
              port: 6784
            initialDelaySeconds: 30
          securityContext:
            privileged: true
          volumeMounts:
            - name: weavedb
              mountPath: /weavedb
            - name: cni-bin
              mountPath: /host/opt
            - name: cni-bin2
              mountPath: /host/home
            - name: cni-conf
              mountPath: /host/etc
            - name: dbus
              mountPath: /host/var/lib/dbus
            - name: lib-modules
              mountPath: /lib/modules
            - name: xtables-lock
              mountPath: /run/xtables.lock
        - name: weave-npc
          args: ['--metrics-addr=0.0.0.0:6781']
          env:
            - name: HOSTNAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
          image: 'weaveworks/weave-npc:2.3.0'
          securityContext:
            privileged: true
          volumeMounts:
            - name: xtables-lock
              mountPath: /run/xtables.lock
        - name: weave-tc
          image: 'qmachu/weave-tc:0.0.1'
          securityContext:
            privileged: true
          volumeMounts:
            - name: xtables-lock
              mountPath: /run/xtables.lock
            - name: lib-tc
              mountPath: /lib/tc
      hostNetwork: true
      hostPID: true
      restartPolicy: Always
      securityContext:
        seLinuxOptions: {}
      serviceAccountName: weave-net
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
        - key: CriticalAddonsOnly
          operator: Exists
      volumes:
        - name: weavedb
          hostPath:
            path: /var/lib/weave
        - name: cni-bin
          hostPath:
            path: /opt
        - name: cni-bin2
          hostPath:
            path: /home
        - name: cni-conf
          hostPath:
            path: /etc
        - name: dbus
          hostPath:
            path: /var/lib/dbus
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: xtables-lock
          hostPath:
            path: /run/xtables.lock
        - name: lib-tc
          hostPath:
            path: /lib/tc
---
apiVersion: v1
kind: Secret
metadata:
  name: weave-password
  namespace: kube-system
type: Opaque
data:
  password: {{ .weave.password }}

@Quentin-M For some reason /lib/tc does not exist on my nodes. (Debian Jessie) installed with kops using k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08.

@thomaschaaf According to https://packages.debian.org/jessie/amd64/iproute2/filelist, you would be using /usr/lib/tc/ instead (and pareto is well in there).

@bboreham do you have any more insights on this issue? It seems like every day i come across another thread talking about dns timeouts here or there. It feels like a 'dirty little secret' at this point :)

No, no particular insight. I'm trying to cross-fertilise the conversations in the hope someone shows up and says "this is all very clear to me".

@bboreham i see, yes, that's the open source slogan right? "given enough eyes, every problem is trivial"
Thanks for your continued work. Let me know if there's something I can test that would be helpful.

I'll try @Quentin-M 's fix and report back.

so it would be on every host, and could be addressed using the host's own IP? I've seen discussions along those lines; unfortunately changing resolv.conf to point at that IP requires a Kubernetes change.

You can do this already with --resolv-conf passed to kubelet. Run a dnsmasq daemonset that proxies all dns queries to kube-dns using host networking, and listening on all interfaces. This reduces the DNS problems substantially.

As I understand it, --resolv-conf is a single setting for all pods, thus removing the ability to find services in the same namespace as the current pod.

That is what I meant by "requires a Kubernetes change" - to change the DNS server address without giving up any other features. If you don't need those features it's an option.

As I understand it, --resolv-conf is a single setting for all pods, thus removing the ability to find services in the same namespace as the current pod.

If you just need to change the dns server ip you can use --cluster-dns.

As I understand it, --resolv-conf is a single setting for all pods, thus removing the ability to find services in the same namespace as the current pod.

The generated search domains and options are preserved. resolv-conf only parses the nameservers afaik. That's how we set it up.

What DNS IP do you use that always resolves to the local host?

What DNS IP do you use that always resolves to the local host?

You can probably use the local docker bridge ip (172.17.0.1)

Address of the docker interface. This is probably setup dependent. I think you could use any interface on the host that is routable from pods (so not the loopback).

@jsravn I would like to learn more about your setup. Do you by chance use kops?

I would like to see your dnsmasa daemon set manifest if you are willing to post it. My understanding is that kops already runs dnsmasq as a container in it's default kube-dns pod, so we would have to figure out how to disable that in a way that doesn't get undone when we use kops to update the cluster.

@dcowden You wouldn't touch the kube-dns pod, it still runs dnsmasq. The local dnsmasq caches all local queries on the node - benefits being the cache will be localised, you can bypass kube-dns completely if you want for external queries (we do this), and it's more resilient to outages. I can't give you the exact daemonset at the moment, but it shouldn't be so hard, you need to setup hostNetworking and configure dnsmasq to listen on the local docker bridge. The trickier part is configuring kubelet with -resolv-conf, since that won't be easy in hosted solutions like GKE. In this case, it would be nice if k8s had a runtime API for configuring the DNS setup (which it doesn't afaik). You could probably do it with a custom iptables rule to intercept dns requests and transparently route to your local dnsmasq via dnat - this would be done as part of the daemonset. That is feeling pretty hacky though.

(Apologies if I've taken this issue off topic - feel free to contact me on kubernetes slack if you want to discuss further ideas)

@jsravn thanks for this tip. I hadn't thought of this approach, but it has a number of benefits-- for example, it makes it much more straightforward to work in a split-dns corporate environment.

So, as far as I can tell from this thread, there isn't really a solution yet aside from some of these workarounds is there?

Not only is there not a solution, we don't know which of the various theories about the problem is most important in practice.

@bboreham Understandable, we've been migrating to Kubernetes and haven't had really any CNI work for us. Every single one appears to either have high latency or kube-dns issues. Just a bit frustrating since clearly other people are able to make kubernetes work. Hopefully we're able to diagnose which theory is most "important" and/or what has been causing these issues.

@jaredallard I agree with your assessment. For us, using standard network doesnt work because we require encryption between nodes-- which is hard to set up on bare metal, vs weave, that 'just works'.

While technically a workaround, I believe that the dnsmasq solution provided by @jsravn is technically the right answer. In our case, we have split dns and all kinds of weird stuff. At some point, its best to simply let the bare metal layer handle it. I think there's fairly decent evicence that people's SNAT/DNAT problems are pretty much all DNS, so i think running a dnsmasq process on each node makes sense, and should probably be the 'right way', as long as you're still using CNI.

Of course as you pointed out, I agree that if you can avoid CNI, that's probably the 'right choice'-- it removes a whole layer of stuff to deal with.

@jaredallard My weave-tc work around is simple enough to use and fixes the problem for us entirely.

@Quentin-M Does it solve just latency or issues with kube-dns as well? We've pretty much gotten rid of all issues with latency on calico w/ ip-in-ip, but kubedns doesn't work when it gets a lot of hits.

This particularly solves the kernel race condition inside conntrack that drops parallel A/AAAA packets, leading to static 5s latency on each DNS query, regardless of coredns/kubedns/powerdns...

Just posted a little write-up about our journey troubleshooting this issue there: https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/, including our workaround.

tj13 commented

@Quentin-M
can it run on non-Weave network? our environment is ovs + openshift.

@Quentin-M
Hi, I have the same problem as @thomaschaaf :

No distribution data for pareto (/lib/tc//pareto.dist: No such file or directory)

However, I'm using CentOS 7 and there's no iproute2 package. What should I do in this case?

Edit:
Found out it was in /usr/lib64/tc instead of /usr/lib/tc.