DNS-OARC/flamethrower

Flamethrower 0.11.0 sometimes fails to send TCP queries on FreeBSD to BIND 9.11

Mno-hime opened this issue · 2 comments

Flamethrower 0.11.0 sometimes fails to send TCP queries on FreeBSD 12.2 to BIND 9.11. This started for us with Flamethrower 0.11.0, version 0.10 from FreeBSD ports was fine, BIND 9.16 and 9.17 are fine as query targets, also it does not happen on Linux. Here is the original issue in ISC GitLab. The culprit seems to be a7b83e4 (identified with git bisect) and with this change made to flame/tokenbucket.h the problem went away.:

@@ -25,6 +25,7 @@ public:
     {
         if (_token_wallet < tokens) {
             if (_last_fill_ms.count() == 0) {
+                _token_wallet = _rate_qps;
                 _last_fill_ms = now_ms;
             } else if (now_ms > _last_fill_ms) {
                 auto elapsed_ms = (now_ms - _last_fill_ms).count();

Reproducer

Start named from any recent BIND 9.11 version: named -f -c named.conf.

Start Flamethrower instances:

/usr/local/bin/flame --dnssec -P udp -F inet -Q 10000 -p 5300 -v 99 10.53.0.3 > flame.udp.4 &
/usr/local/bin/flame --dnssec -P udp -F inet6 -Q 10000 -p 5300 -v 99 [fd92:7065:b8e:ffff::3] > flame.udp.6 &
/usr/local/bin/flame --dnssec -P tcp -F inet -Q 10000 -p 5300 -v 99 10.53.0.3 > flame.tcp.4 &
/usr/local/bin/flame --dnssec -P tcp -F inet6 -Q 10000 -p 5300 -v 99 [fd92:7065:b8e:ffff::3]  > flame.tcp.6 &

After some time kill all Flamethrower instances with killall flame.

Grep for total queries sent and received in output files (it's not always zero queries sent but one in five TCP instances fails like this and won't recover):

$ grep ^total flame.*.*
flame.tcp.4:total sent  : 0
flame.tcp.4:total rcvd  : 0
flame.tcp.6:total sent  : 0
flame.tcp.6:total rcvd  : 0
flame.udp.4:total sent  : 80820
flame.udp.4:total rcvd  : 80803
flame.udp.6:total sent  : 80820
flame.udp.6:total rcvd  : 80777

flame.tcp.4:

--class: "IN"
--dnssec: true
--help: false
--qps-flow: null
--targets: null
--version: false
-F: "inet"
-M: "GET"
-P: "tcp"
-Q: "10000"
-R: false
-T: "A"
-b: null
-c: "10"
-d: "1"
-f: null
-g: "static"
-l: "0"
-n: "0"
-o: null
-p: "5300"
-q: "10"
-r: "test.com"
-t: "3"
-v: "99"
GENOPTS: []
TARGET: "10.53.0.3"
binding to 0.0.0.0
flaming target(s) [10.53.0.3] on port 5300 with 30 concurrent generators, each sending 100 queries every 1000ms on protocol tcp
query generator [static] contains 1 record(s)
rate limit @ 10000 QPS (333.333 QPS per concurrent sender)
0.919358s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
1.92136s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
2.92128s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
3.93132s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
4.94233s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
5.95264s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
6.96256s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
7.97184s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0
8.01863s: send: 0, avg send: 0, recv: 0, avg recv: 0, min/avg/max resp: 0/nan/0ms, in flight: 0, timeouts: 0

------
run id      : 28ebb537ed6abf2b
run start   : 2021-07-20T11:24:44Z
runtime     : 8.02064 s
total sent  : 0
total rcvd  : 0
min resp    : 0 ms
avg resp    : nan ms
max resp    : 0 ms
avg r qps   : 0
avg s qps   : 0
avg pkt     : 0 bytes
tcp conn.   : 45
timeouts    : 0 (nan%) 
bad recv    : 0
net errors  : 0

named configuration files

named.conf:

options {
    listen-on { 10.53.0.3; };
    listen-on-v6 port 5300 { fd92:7065:b8e:ffff::3; };
    port 5300;
    directory "/home/newman/output/ns3";
    allow-recursion { any; };
    query-source address 10.53.0.3;
    pid-file "named.pid";
    recursion yes;
    tcp-clients 50;
    statistics-file "named.stats";
};

view "default" {
    zone "." {
        type hint;
        file "root.hint";
    };
};

root.hint:

$TTL 999999
.                        IN NS  a.root-servers.nil.
a.root-servers.nil.      IN A   10.53.0.1

@Mno-hime Thanks for the detailed report! We'll take a look.

BIND 9.11 has been EoL for some time, and this is not an active issue for us anymore. I don't think this issue needs to be kept open.