TCP timeout issue
Y-Tian opened this issue · 6 comments
[Logs]:
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.015 [DEBUG] plain handler: new tcp connection from 127.0.0.1:33410
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.017 [DEBUG] plain handler for 127.0.0.1:33410 returned. closing conn
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.376 [DEBUG] GrafanaNet sent metrics in 71.701728ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.386 [DEBUG] GrafanaNet sent metrics in 81.80253ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.388 [DEBUG] GrafanaNet sent metrics in 85.61965ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.394 [DEBUG] GrafanaNet sent metrics in 92.004397ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.396 [DEBUG] GrafanaNet sent metrics in 92.756409ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.397 [DEBUG] GrafanaNet sent metrics in 94.793193ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.404 [DEBUG] GrafanaNet sent metrics in 102.231334ms -msg size 2
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.408 [DEBUG] GrafanaNet sent metrics in 104.683852ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.409 [DEBUG] GrafanaNet sent metrics in 105.30543ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.409 [DEBUG] GrafanaNet sent metrics in 105.159224ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.410 [DEBUG] GrafanaNet sent metrics in 106.679546ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.414 [DEBUG] GrafanaNet sent metrics in 110.317278ms -msg size 1
Mar 9 23:49:55 vagrant carbon-relay-ng[18688]: 2020-03-09 23:49:55.455 [DEBUG] GrafanaNet sent metrics in 152.397932ms -msg size 1
Mar 9 23:50:00 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:00.000 [DEBUG] stats flushing for 2020-03-09 23:50:00 +0000 UTC m=+395.121794354 to graphite
Mar 9 23:50:03 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:03.818 [WARNING] stats dialing 192.0.2.0:2003 failed: dial tcp 192.0.2.0:2003: connect: connection timed out. will retry
Mar 9 23:50:05 vagrant nodejs[18721]: Flushing stats at Mon Mar 09 2020 23:50:05 GMT+0000 (Coordinated Universal Time)
Mar 9 23:50:05 vagrant nodejs[18721]: { counters:
Mar 9 23:50:05 vagrant nodejs[18721]: { 'statsd.bad_lines_seen': 0,
Mar 9 23:50:05 vagrant nodejs[18721]: 'statsd.packets_received': 0,
Mar 9 23:50:05 vagrant nodejs[18721]: 'statsd.metrics_received': 0 },
Mar 9 23:50:05 vagrant nodejs[18721]: timers: {},
Mar 9 23:50:05 vagrant nodejs[18721]: gauges: { 'statsd.timestamp_lag': 0 },
Mar 9 23:50:05 vagrant nodejs[18721]: timer_data: {},
Mar 9 23:50:05 vagrant nodejs[18721]: counter_rates:
Mar 9 23:50:05 vagrant nodejs[18721]: { 'statsd.bad_lines_seen': 0,
Mar 9 23:50:05 vagrant nodejs[18721]: 'statsd.packets_received': 0,
Mar 9 23:50:05 vagrant nodejs[18721]: 'statsd.metrics_received': 0 },
Mar 9 23:50:05 vagrant nodejs[18721]: sets: {},
Mar 9 23:50:05 vagrant nodejs[18721]: pctThreshold: [ '90', '95', '99' ] }
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.008 [DEBUG] plain handler: new tcp connection from 127.0.0.1:33414
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.010 [DEBUG] plain handler for 127.0.0.1:33414 returned. closing conn
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.399 [DEBUG] GrafanaNet sent metrics in 72.260998ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.406 [DEBUG] GrafanaNet sent metrics in 80.667469ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.410 [DEBUG] GrafanaNet sent metrics in 84.23987ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.422 [DEBUG] GrafanaNet sent metrics in 95.515774ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.423 [DEBUG] GrafanaNet sent metrics in 97.165266ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.423 [DEBUG] GrafanaNet sent metrics in 97.654528ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.443 [DEBUG] GrafanaNet sent metrics in 117.550005ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.444 [DEBUG] GrafanaNet sent metrics in 118.079867ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.445 [DEBUG] GrafanaNet sent metrics in 119.07257ms -msg size 2
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.445 [DEBUG] GrafanaNet sent metrics in 118.893423ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.446 [DEBUG] GrafanaNet sent metrics in 119.770423ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.447 [DEBUG] GrafanaNet sent metrics in 119.983126ms -msg size 1
Mar 9 23:50:05 vagrant carbon-relay-ng[18688]: 2020-03-09 23:50:05.448 [DEBUG] GrafanaNet sent metrics in 121.715464ms -msg size 1
Mar 9 23:50:07 vagrant carbon-relay-ng[18688]: 2020/03/09 23:50:07 dial tcp 192.0.2.0:2003: connect: connection timed out
As you can see above, there exists a recurring timeout (exactly every 2m).
I thought this would be related to issue#250 and PR#318, so I changed this carbon-relay-ng.ini files plain_read_timeout
and pickle_read_timeout
to "0s".
However, I still get the TCP timeout error.
Is this regular behavior (aka every single connection creates a new socket and each socket automatically times out after 2m)? Otherwise, looking for advice on how to fix this.
Hello Yuda Tian,
the timeouts you see are on an outbound carbon/tcp connection to 192.0.2.0:2003, this would be either a carbon route to that address, or instrumentation.graphite_addr if you have configured it (please confirm which one)
the variables you mention control timeouts on inbound connections, and are thus unrelated.
Right, I'm trying to discard the metrics from instrumentation (by sending it to a reserved IP for discarding), I think it's referenced here:
AFAIK there doesn't seem to be any support for this yet?
that's correct. the simple solution is to simply send the stats back into the relay, which is very handy anyway.
So I would be setting: graphite_addr = "0.0.0.0:2003"
, sending the stats back to the listen_addr
?
yep