Network-loss doesn`t work as expected in multicontainer pod
jan-machacek-kosik opened this issue · 1 comments
What happened:
I have a pod with one main container and three sidecars. When network loss is applied to the custom sidecar container and the destination host, all network connectivity in the pod is lost.
What you expected to happen:
I expect traffic loss to affect only connections from the targeted sidecar container to my otel-collector K8s service
How to reproduce it (as minimally and precisely as possible):
this is env values for this experiment:
- name: TARGET_CONTAINER
value: otel-agent
- name: LIB_IMAGE
- name: NETWORK_PACKET_CORRUPTION_PERCENTAGE
value: "100"
- name: TOTAL_CHAOS_DURATION
value: "600"
- name: CONTAINER_RUNTIME
value: containerd
- name: DESTINATION_HOSTS
value: dev-collector.otel-collector.svc.cluster.local
- name: DEFAULT_HEALTH_CHECK
value: "false"
- name: SEQUENCE
value: parallel
So I expected that the connection from the container otel-agent to dev-collector.otel-collector.svc.cluster.local would be disabled, but all other connections from all pods to any endpoint would be enabled. However, when this experiment is running, every connection from all pods is disabled, causing the readiness probe to fail.
When I investigated how this experiment works, I realized that this command is applied:
sudo nsenter -t 561580 -n tc qdisc replace dev eth0 root handle 1: prior
sudo nsenter -t 561580 -n tc qdisc replace dev eth0 parent 1:3 netem loss 100
sudo nsenter -t 561580 -n tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst 10.0.30.140 flowid 1:3
It looks like only the connection to 10.0.30.140 is closed, that is correct.
But in real experiment every connection outside of pod is disabled. For example, sidecar with proxysql container is not allowed to connect to databasse.
Anything else we need to know?:
I run this experiment on AKS cluster.
Kubernetes version: 1.29.2
Limus helm targetRevision: 3.8.0.
Manifest of the experiment is attached.
Further investigation revealed, that the problem affects only the sidecar with proxysql container, other outgoing connection works.