Telegraf timeouts

Question

Telegraf timeouts

rpatel3001 opened this issue 3 years ago · 14 comments

Getting a ton of the following error

Error in plugin: exec: command timed out for command 'bash /scripts/telegraf_input_readsb_protoc_range.sh':

The script takes just about 5 seconds (the default timeout) to run on my machine (pi 2B) running both readsb-protobuf and dump978 in docker as the only load. I fixed my issue by adding a 10s timeout line for this script in /rootfs/etc/cont-init.d/04-telegraf at line 92, but there may be a better fix.

Answer 1 · 2021-11-14T07:01:31.000Z

Prometheus works well, thanks for that. Only complaint I have is that the schema is different in the database, though I don't know how much of that is in your control, and it's fairly easy to strip a few columns and a prefix from the field name.

Also, after updating to tonight image, I'm getting this spammed to my log every second:

readsb     | [telegraf_vrs_connector] 2021/11/14 01:54:37 socat[19307] E connect(5, AF=2 127.0.0.1:33333, 16): Connection refused

Is this related or should I open a new issue? Does the VRS connector need to be enabled all the time?

Answer 2 · 2021-11-14T08:20:00.000Z

The VRS connector outputs the JSON that I feed into Telegraf in order to produce the data for Influx/Prometheus. I’ll have to look into this one.

Answer 3 · 2021-11-15T02:53:54.000Z

Ah I didn't notice before but aircraft data was no longer being logged. Fixed this one by replacing [[ -n "$INFLUXDBURL" ]] with [ -n "$INFLUXDBURL" ] || [ -n "$ENABLE_PROMETHEUS" ] in /etc/services.d/readsb/run. However, I'd prefer an option to turn off logging of aircraft data while keeping all the statistical data. Also, now getting

[readsb] 2021/11/14 21:21:57 Beast TCP output: Unable to send data, disconnecting: 192.168.0.10 port 39636 (fd 17, SendQ 19614)

at random intervals from 6 seconds to several minutes. The port changes every time, in the range of about 37000 to 48000 as far as I've seen. The IP is the server running influxdb/adsbx/tar1090. This might be a network issue on my end dropping the connection?

Answer 4 · 2021-11-24T22:45:22.000Z

I'm seeing the same connection refused messages using the image mikenye/readsb-protobuf:v4.0.2.

[telegraf_vrs_connector] 2021/11/24 23:43:25 socat[8040] E connect(5, AF=2 127.0.0.1:33333, 16): Connection refused

however I didn't succeed in getting that metrics page served :(

Answer 5 · 2021-11-28T05:55:17.000Z

@visibilityspots you can fix that message by either doing the manual fix mentioned above or by explicitly setting the VRS port environment variable in your docker compose file.

Answer 6 · 2021-12-10T21:51:51.000Z

yet I'm wondering what that VRS port should be?

Answer 7 · 2021-12-10T22:42:28.000Z

Should be the same as the error message, 33333

Answer 8 · 2021-12-11T15:30:45.000Z

perfect, prometheus works only got this error message at the beginning;
[telegraf] 2021-12-10T22:08:01Z E! [inputs.exec] Error in plugin: metric parse error: expected field at 72:31: "polar_range,bearing=71 range= 1639174080000000000"

and from time to time this time out;
[telegraf] 2021-12-11T03:14:09Z E! [inputs.exec] Error in plugin: exec: command timed out for command 'bash /scripts/telegraf_input_autogain.sh':

@rpatel3001 I'm wondering if you perhaps already translated the influxdb grafana dashboard to a prometheus based one and are willing to share that json code? I've got the obvious grafs already but need to figure out the ones with calculations :)

Answer 9 · 2021-12-11T17:12:03.000Z

the first one is expected for a short time after container startup as per #21 (comment)

the second can be fixed similarly to my fix in the first post, insert timeout = "10s" at the end of /etc/telegraf/telegraf.d/inputs_file_autogain.conf and then killing telegraf. Note that this has to be done every time the container is restarted.

Answer 10 · 2021-12-11T17:36:19.000Z

that explains thanks for the pointer! I've uploaded my version of the prometheus based grafana dashboard, could be used as a template to create an official one maybe @mikenye? I'll delete mine once there is an approved mikenye one ;)
https://grafana.com/grafana/dashboards/15377

Answer 11 · 2021-12-11T18:20:59.000Z

I also adapted the existing InfluxDB v1 dashboard, a little further from the original than yours. JSON here: https://gist.github.com/rpatel3001/a703c6b70863ab29eea2af7a8196920a

I never was able to get the radar graph working for max range.

Answer 12 · 2021-12-16T09:42:37.000Z

@rpatel3001 that gist is still using influx as a datasource?

Answer 13 · 2022-01-03T01:31:11.000Z

@visibilityspots ah sorry, didn't fully read your question I guess. I'm using InfluxDB V2 to pull in the prometheus formatted data, not prometheus itself. The published dashboard uses Influx DB V1, which is incompatible with V2; my dashboard is modified to use V2.

Answer 14 · 2022-02-16T03:04:35.000Z

[telegraf_vrs_connector] 2022/02/15 22:01:56 socat[427] E connect(5, AF=2 127.0.0.1:33333, 16): Connection refused

I am getting this issue when the container starts. I have READSB_NET_VRS_PORT=33333 set, but I still get the same error.

EDIT: Ignore, its working fine. It only throws an error once at startup.