microsoft/SDN

[K8S] Windows pod to service IP (served by linux pod) connection stop working after some time.

rtonde-nv opened this issue · 2 comments

In our production environment, windows pod to service IP (served by linux pod) connection stop working after some time. If we use pod IP instead of service IP, we don’t see any issue. Hence this is not network issue.
Same step-up in preprod environment is working without any issue. Only difference is number of k8s services. In preprod we have ~73 services where production have ~230 service (and ~1290 vfp rules.)
Our current understanding of root cause is high number of vfp rules. is there any limit on number of vfp rules?
Can you please help to identify root cause for this?
We have collected output of packet capture & logs as suggested in https://github.com/microsoft/SDN/tree/master/Kubernetes/windows/debug
collect-logs.zip
server.etl-part1.zip
server.etl-part2.zip

Please let us know if more details required.

Which Windows Server OS version are you using and is it up to date? There have been a number of relevant fixes here recently on Windows Server 2019 updates.

Which pod and which service IP are you trying to reach? Is the service IP showing up in hnsdiag list loadbalancers ?

Closing. We were seeing issue due to load on k8s cluster. Reduction in k8s load helped to fix issue