How about an option flag added to force the closing and reopening of connections for testing servers behind a load balancer?
fnickels opened this issue · 4 comments
The current mode of keeping connections open works fine for hitting a static set of DNS servers, but when targeting a load balancer in front of a set of DNS servers that scales up or down the test connections do not redistribute across the new set of targets after scale out or scale in events. I have manually restarted the test sessions to fake this kind of functionality.
I am not sure how most client resolvers behave with regard to keeping connections open with DNS servers, but if they close connections by default, or after a certain amount of inactivity, I think this should be the default behavior of flamethrower as it would represent real-world behavior, and if they don't then obviously the current behavior is fine.
I think simply forcing connections to open and close with each burst of queries would resolve this issue, and imagine it could be a simple binary option flag.
Hi @fnickels, can you confirm which protocol you are using to do the testing - and is it the same protocol in front of and behind the load balancer? You mentioned sessions and keeping connections open, so I am guessing one of TCP, DoT, DoH. In all of those cases, flamethrower should be opening a new connection for each of the traffic generators (the number of which is controlled with -c
), and then for each of those sending QCOUNT queries (controlled with -q
) before closing and reconnecting.
Do you get the behavior you are looking for with -q 1
?
This is what we were using initially
flame ns1edge-new-health-check-106-4a43349c758bce8a.elb.us-west-2.amazonaws.com -g randomlabel lblsize=10 lblcount=4 count=1000 -r riotgames.io -v 3 -c 500 -d 5 -q 40
We are trying to loadtest an autoscaling policy behind a load balancer with our ns1edge instances. The scaling policy adds nodes behind the load balancer just fine as the load goes up, but the traffic stays on the nodes that were active prior to the scaling operation.
And here is how I worked arround the issue:
while [ true ]; do
echo flaming DNS for 75 seconds
flame ns1edge-new-health-check-106-4a43349c758bce8a.elb.us-west-2.amazonaws.com -g randomlabel lblsize=10 lblcount=4 count=1000 -r riotgames.io -v 3 -c 500 -d 5 -q 40 -l 75
done
By killing the job every 75 seconds the load redistributes as we would expect.
I am assuming the traffic is all UDP, which has me a little puzzled, but I watched https://www.youtube.com/watch?v=iONXcli1afI and at one point he talks about keeping connections open to drive more load. I don't know enough C++ to figure out in the code where to try to force disconnects, nor do I understand how this is accomplished with UDP as I thought it was stateless.
I suspect the loadbalancer might have some role in this. But maybe by forcing flamethrower to open a new connection the behavior is changed.
Thanks for the detailed description. That command line would indeed generate UDP traffic, which is connectionless. However, my guess is that the load balancer is hashing a tuple of packet information to keep sending the same "connections" to the same nodes behind the LB - probably (source IP, destination IP, source port, destination port). See https://kb.isc.org/docs/aa-01183 which talks a bit about this concept.
If this is true, then since source IP, destination IP and destination port never change in your setup, the only variable is source port. flamethrower will use a different, random source port per traffic sender (-c
, in your case 500). However, the source ports will never change during a run. I think your workaround works because when you start a new instance of flame, it chooses a new list of random source ports which then hash to different nodes behind your load balancer.
I think it's fine to continue to use your work around. Alternatively, in the spirit of your issue title, we could think about adding an option to rebind the UDP ports on an interval so that they changed during a single run.
Yeah not an urgent need, but I imagine others may run into similar issues and it would be nice to either have that be the default behavior or selectable with an option flag.