Telegraf timeouts
rpatel3001 opened this issue · 14 comments
Getting a ton of the following error
Error in plugin: exec: command timed out for command 'bash /scripts/telegraf_input_readsb_protoc_range.sh':
The script takes just about 5 seconds (the default timeout) to run on my machine (pi 2B) running both readsb-protobuf and dump978 in docker as the only load. I fixed my issue by adding a 10s timeout line for this script in /rootfs/etc/cont-init.d/04-telegraf at line 92, but there may be a better fix.
Prometheus works well, thanks for that. Only complaint I have is that the schema is different in the database, though I don't know how much of that is in your control, and it's fairly easy to strip a few columns and a prefix from the field name.
Also, after updating to tonight image, I'm getting this spammed to my log every second:
readsb | [telegraf_vrs_connector] 2021/11/14 01:54:37 socat[19307] E connect(5, AF=2 127.0.0.1:33333, 16): Connection refused
Is this related or should I open a new issue? Does the VRS connector need to be enabled all the time?
The VRS connector outputs the JSON that I feed into Telegraf in order to produce the data for Influx/Prometheus. I’ll have to look into this one.
Ah I didn't notice before but aircraft data was no longer being logged. Fixed this one by replacing [[ -n "$INFLUXDBURL" ]] with [ -n "$INFLUXDBURL" ] || [ -n "$ENABLE_PROMETHEUS" ] in /etc/services.d/readsb/run. However, I'd prefer an option to turn off logging of aircraft data while keeping all the statistical data. Also, now getting
[readsb] 2021/11/14 21:21:57 Beast TCP output: Unable to send data, disconnecting: 192.168.0.10 port 39636 (fd 17, SendQ 19614)
at random intervals from 6 seconds to several minutes. The port changes every time, in the range of about 37000 to 48000 as far as I've seen. The IP is the server running influxdb/adsbx/tar1090. This might be a network issue on my end dropping the connection?
I'm seeing the same connection refused messages using the image mikenye/readsb-protobuf:v4.0.2.
[telegraf_vrs_connector] 2021/11/24 23:43:25 socat[8040] E connect(5, AF=2 127.0.0.1:33333, 16): Connection refused
however I didn't succeed in getting that metrics page served :(
@visibilityspots you can fix that message by either doing the manual fix mentioned above or by explicitly setting the VRS port environment variable in your docker compose file.
yet I'm wondering what that VRS port should be?
Should be the same as the error message, 33333
perfect, prometheus works only got this error message at the beginning;
[telegraf] 2021-12-10T22:08:01Z E! [inputs.exec] Error in plugin: metric parse error: expected field at 72:31: "polar_range,bearing=71 range= 1639174080000000000"
and from time to time this time out;
[telegraf] 2021-12-11T03:14:09Z E! [inputs.exec] Error in plugin: exec: command timed out for command 'bash /scripts/telegraf_input_autogain.sh':
@rpatel3001 I'm wondering if you perhaps already translated the influxdb grafana dashboard to a prometheus based one and are willing to share that json code? I've got the obvious grafs already but need to figure out the ones with calculations :)
the first one is expected for a short time after container startup as per #21 (comment)
the second can be fixed similarly to my fix in the first post, insert timeout = "10s" at the end of /etc/telegraf/telegraf.d/inputs_file_autogain.conf and then killing telegraf. Note that this has to be done every time the container is restarted.
that explains thanks for the pointer! I've uploaded my version of the prometheus based grafana dashboard, could be used as a template to create an official one maybe @mikenye? I'll delete mine once there is an approved mikenye one ;)
https://grafana.com/grafana/dashboards/15377
I also adapted the existing InfluxDB v1 dashboard, a little further from the original than yours. JSON here: https://gist.github.com/rpatel3001/a703c6b70863ab29eea2af7a8196920a
I never was able to get the radar graph working for max range.
@rpatel3001 that gist is still using influx as a datasource?
@visibilityspots ah sorry, didn't fully read your question I guess. I'm using InfluxDB V2 to pull in the prometheus formatted data, not prometheus itself. The published dashboard uses Influx DB V1, which is incompatible with V2; my dashboard is modified to use V2.
[telegraf_vrs_connector] 2022/02/15 22:01:56 socat[427] E connect(5, AF=2 127.0.0.1:33333, 16): Connection refused
I am getting this issue when the container starts. I have READSB_NET_VRS_PORT=33333
set, but I still get the same error.
EDIT: Ignore, its working fine. It only throws an error once at startup.