Disruptor does not affect traffic sent via `kubectl port-forward`
Closed this issue ยท 10 comments
Note: This is the outcome of researching why https://github.com/grafana/quickpizza/pull/2/files was not working as expected. Thanks @ppcano for reporting this to us!
The current configuration for the disruptor will cause it to not disrupt traffic sent to the pods by means of kubectl port-forward
, no matter if the target of the port-forward
is a pod or a service.
The reason for this is that the disruption is limited to traffic flowing throught the eth0
interface, as restricted by the iptables
command here:
xk6-disruptor/pkg/iptables/iptables.go
Line 13 in f405082
And defaulted here:
xk6-disruptor/cmd/agent/commands/http.go
Line 75 in c9403f9
This is because when using kubectl port-forward
, traffic is forwarded through the lo
interface. This can be checked by running kubectl port-forward
and then tcpdump
on the pod, observing that traffic only shows up when capturing in lo
(-i lo
):
The solution for this is non-trivial.
โ Workaround attempt with iface: "lo"
A tempting way to workaround this would be to specify iface: "lo"
in the disruptor configuration. This, unfortunately, will not work, as the disruptor proxy itself uses localhost
(thus lo
) as upstream:
xk6-disruptor/cmd/agent/commands/http.go
Line 21 in c9403f9
If attempted, the generated REJECT
rule:
xk6-disruptor/pkg/iptables/iptables.go
Line 15 in c9403f9
shuts down this otherwise infinite connection loop:
# Curl to upstream directly is refused by the REJECT rule.
/ # curl 127.0.0.1:3333
curl: (7) Failed to connect to 127.0.0.1 port 3333 after 0 ms: Couldn't connect to server
# Curl to the proxy echoes the error the proxy sees when its connection to upstream is
# severed by the same rule.
/ # curl localhost:8080
Get "http://127.0.0.1:3333/": dial tcp 127.0.0.1:3333: connect: connection reset by peer/ #
The test, however, will appear to work with neither disruptions nor resets. This is because kubectl port-forward
will forward the target port as both v4 and v6 address. k6 will see a connection reset while attempting to connect to 127.0.0.1:3333
, as per the REJECT
rule above. When this happens, k6 will automatically fall back to [::1]:3333
, which is exempt from the disruption, and thus work normally without neither disruption nor reset.
Moving forward
There are several paths forward, none of them trivial:
- Document the problem and leave disruption of
port-forward
out of the scope of the problem. - Allow to use the pod IP, rather than
localhost
, as upstream, so theiface: "lo"
workaround works.
a. This will have bad UX - Send proxy traffic from a specific address, and exclude that address from iptables redirection
Document the problem and leave disruption of port-forward out of the scope of the problem.
I think this should be done as a temporary measure, as the solution to the problem may take time to implement and test properly, to avoid users facing the same issue to waste time debugging the problem.
Allow to use the pod IP, rather than localhost, as upstream, so the iface: "lo" workaround works.
a. This will have bad UX
I've not checked if this will actually solve the issue, but assuming it will, I don't see why this sould be a bad UX, as the IP can be detected by the agent and used in the command.
I've not checked if this will actually solve the issue, but assuming it will, I don't see why this sould be a bad UX, as the IP can be detected by the agent and used in the command.
I probably should have elaborated further. I see two main issues with this approach:
The first one is that if add this way to change the target upstream (e.g. targetPodIP: true
(or something similar)), this option effectively requires setting iface: "lo"
. If iface
is set to lo
, but targetPodIP
is false (default), the pod loses connectivity. Likewise, if iface
is set to eth0
(default) and targetPodIP
is true
, the pod also loses connectivity. We could define a higher-level option, like local: true
, as an abstraction to solve this.
The second (and probably more important) problem is that we cannot disrupt the same address we use as upstream. So if we use the podIP as upstream, we will not be able to disrupt regular traffic, just localhost (port-forward) traffic. Having to specify this on the test does not feel like good UX to me, specially because it may fail if the deployment mode is changed later.
Thinking more about this, it might be possible to enable this without UX compromises by complicating the setup and iptables
rules. The rough idea I have is to:
First, we add a new local-only IP to the pod, e.g. 127.6.0.1/8
, which the proxy will listen on. Then, we will modify the proxy code to egress traffic from this address (net.Dialer.LocalAddr)
Then, we modify the iptables
rules so they look like the following:
-A PREROUTING -t nat -p tcp --dport {{upstreamPort}} ! -s 127.6.0.1/32 -j DNAT --to-destination 127.6.0.1:{{proxyPort}}"
-A INPUT -i %s -p tcp --dport {{upstreamPort}} ! -s 127.6.0.1/32 -m state --state ESTABLISHED -j REJECT --reject-with tcp-reset"
The effect of this would be that all traffic directed towards the upstream application will be redirected to the proxy, except the traffic from the proxy itself (as per ! -s 127.6.0.1/32
).
This removes the need to specify an interface (it can still be added if we find it valuable, but it's not needed), and redirects traffic to the real application except if that traffic comes from the proxy.
The second (and probably more important) problem is that we cannot disrupt the same address we use as upstream. So if we use the podIP as upstream, we will not be able to disrupt regular traffic, just localhost (port-forward) traffic. Having to specify this on the test does not feel like good UX to me, especially because it may fail if the deployment mode is changed later.
I think that if we document the issue for pods exposed using port forward and we define a simple option that addresses this issue (similar to the idea of the local
option you suggested above) the UX is acceptable.
Also, I think we don't need to support disrupting traffic from the port forwarding and the regular traffic at the same time. This seems like a reasonable limitation to me.
First, we add a new local-only IP to the pod, e.g. 127.6.0.1/8, which the proxy will listen on.
Do we need to create a new IP address? if the proxy always binds to the pod IP and the iptables rules exclude this address as a source IP, I think the result would be the same. The traffic coming to the Pod through the eth0
would never have this IP address as a source address (the traffic is originated outside the pod)
Do we need to create a new IP address? if the proxy always binds to the pod IP and the iptables rules exclude this address as a source IP, I think the result would be the same.
It sounds like this should be the case, but using an IP that already has a different purpose and is perceived to be external for this seems... strange. It might be confusing even for us why we decided to do this.
An extra bonus of using a dedicated IP, we reduce chances of collision with applications.
First, we add a new local-only IP to the pod, e.g. 127.6.0.1/8, which the proxy will listen on.
Which interface will have this IP address?
Which interface will have this IP address?
My idea was to add it to lo
.