frdmn/docker-speedtest-grafana

Works fine on a RPi 4 for weeks, then ceases to function

mmaxbar opened this issue · 12 comments

I got this to run on a rpi4 with a modified docker-compose.yml, just needed to build the "speedtest" container on the rpi4 instead of pulling it from the repository to make it work.

Now I observe the following problem:
After some days/weeks, the speedtest container stops adding new data to the influxdb. In the logs I can see the speedtest container restart due to the node index.js command exiting with some error related to name resolution. No problem with that, shit happens, maybe network is temporarily down for some reason etc..

The "speedtest" container is then automatically restarted as it should be, but does not continue doing its job of performing speed tests, I cant figure out why... docker-compose down + docker-compose up fixes it.

I think the problem is connected to the SIGTERM, see the log below. This happens in the first iteration after the container is restarted.

Any ideas?

speedtest    | [INFO][Fri Jan 22 2021 17:12:48 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
speedtest    | [INFO][Fri Jan 22 2021 17:13:17 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 19.204848000000002, Upload: 15.571952, Ping: 45.808
speedtest    | [INFO][Fri Jan 22 2021 17:13:17 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
speedtest    | [INFO][Fri Jan 22 2021 17:43:17 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
speedtest    | Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
speedtest    | {"type":"log","timestamp":"2021-01-22T17:43:34Z","message":"Failed to resolve host name. Cancelling test suite.","level":"error"}
speedtest    | {"error":"Host resolve failed: Timeout occurred in connect."}
speedtest    | [INFO][Fri Jan 22 2021 17:43:36 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
speedtest    | Command was killed with SIGTERM (Termination): speedtest --accept-license --accept-gdpr -f json
speedtest    | {"type":"log","timestamp":"2021-01-22T17:43:55Z","message":"Error: [11] Cannot read: Resource temporarily unavailable","level":"error"}

I face the same problem on x86. Working well for weeks and suddently it stops submitting new data to influxdb.

[INFO][Mon Jan 25 2021 18:13:01 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 989.088632, Upload: 94.96044, Ping: 3.283
[INFO][Mon Jan 25 2021 18:13:02 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...,
[INFO][Mon Jan 25 2021 18:28:02 GMT+0000 (Coordinated Universal Time)] Starting speedtest...,
[INFO][Mon Jan 25 2021 18:28:30 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 997.594568, Upload: 95.652912, Ping: 4.846,
[INFO][Mon Jan 25 2021 18:28:30 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...,
[INFO][Mon Jan 25 2021 18:43:30 GMT+0000 (Coordinated Universal Time)] Starting speedtest...,
Command was killed with SIGTERM (Termination): speedtest --accept-license --accept-gdpr -f json,

Ahh this is the same problem I posted....
it's because Speedtest thinks you are using the request for commercial so it is asking you to accept the licence agreement.

I got round it by creating a cron job tear things down and restart them every 3 hours - at least this way I keep some visible monitoring going (I have my monitor set to every 10 minutes (SPEEDTEST_SPEEDTEST_INTERVAL=600) to try and get some granular detail of how poor my ISP is!)

sudo docker-compose - stop
sudo docker-compose up -d 

Be good if someone can work out how to handle the JSON response and accept it Y - that would solve the problem.

@R32GUY : On what info do you base your diagnosis that this has sth to do with accepting the license agreement? Can you explain a bit?

I had a similar issue when i was running speedtest-cli from command line - but now i've looked at the code I suspect the problem is that Speedtest in the docker container is just erroring and there is no handled exception to retry...

} catch (err) { console.error(err.message); process.exit(1);

My return code =2 when it fails (and has been quite a lot, but this could be down to Speedtest or the fact that my internet and DNS resolution is down, so I'm going to try and amend the index.js to handle the error and retry

My Short term solution to fix this is:

Stop the containers..

Navigate to /docker-speedtest-grafana/docker/speedtest

nano index.js

and at the bottom, remove the process exit on error >

} catch (err) {
console.error(err.message);
process.exit(1);
}
})();

Replace with

} catch (err) {
console.error(err.message);
sleep.sleep(5);
return;
}
})();

(I also have a cron job at midnight to stop and restart the services for belt and braces)

I dont speak JS but this looks simple enough, I will give it a go and keep you updated.

Didnt solve the problem, but it seems to catch the actual error message, here is the new log tail:

[INFO][Sat Feb 06 2021 17:47:53 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Sat Feb 06 2021 17:48:21 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 15.639936, Upload: 7.892368, Ping: 41.596
[INFO][Sat Feb 06 2021 17:48:21 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Sat Feb 06 2021 18:18:21 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Sat Feb 06 2021 18:18:47 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 16.274752, Upload: 7.616248, Ping: 23.327
[INFO][Sat Feb 06 2021 18:18:47 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Sat Feb 06 2021 18:48:47 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
[2021-02-06 18:48:52.194] [error] Trying to get interface information on non-initialized socket.
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Sat Feb 06 2021 18:49:23 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Sat Feb 06 2021 18:49:49 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Sat Feb 06 2021 18:50:13 GMT+0000 (Coordinated Universal Time)] Starting speedtest...

I think the problem is that the speedtest called by the execa function in the script does not return, i am trying this now:

const getSpeedMetrics = async () => {
  const { stdout } = await execa("speedtest", [
    "--accept-license",
    "--accept-gdpr",
    "-f",
    "json"
  ], {timeout:600000 });
  const result = JSON.parse(stdout);
  return {
    upload: bitToMbps(result.upload.bandwidth),
    download: bitToMbps(result.download.bandwidth),
    ping: result.ping.latency
  };
};

Adding the timeout option to the execa command will not allow the script to hang indefinetly here. Doing this with a fixed time constant is a bit of a hack of course because the timeout value shold probably correspond to the frequency of / time in between the speedtests performed.

This solves the issue, apparently. The same error that lead to hanging speedtest container still occurs in the logs, but i get valid measurements afterwards.

@frdmn : I dont know if this is in your interest to fix, but since I was not the only one with that problem, it may be worth a look. The timeout value of 600000 assumes that no single run of speedtest should take longer than 10 minutes, but there may very well be a more intelligent solution to this particular problem.

My docker log for the speedtest container now looks like this:

[INFO][Fri Feb 12 2021 10:17:09 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 18.183192, Upload: 16.04832, Ping: 23.385
[INFO][Fri Feb 12 2021 10:17:09 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 10:47:09 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 10:47:36 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 17.74256, Upload: 15.348288, Ping: 23.627
[INFO][Fri Feb 12 2021 10:47:36 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 11:17:36 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 11:18:03 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 11.21484, Upload: 12.840128, Ping: 20.978
[INFO][Fri Feb 12 2021 11:18:03 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 11:48:03 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 11:48:23 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 13.343568, Upload: 15.243879999999999, Ping: 23.854
[INFO][Fri Feb 12 2021 11:48:23 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 12:18:23 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Fri Feb 12 2021 12:18:42 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 12:19:03 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 7.048976, Upload: 8.519168, Ping: 30.478
[INFO][Fri Feb 12 2021 12:19:03 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 12:49:03 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Fri Feb 12 2021 12:49:24 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Fri Feb 12 2021 12:49:46 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
[2021-02-12 12:49:51.910] [error] Trying to get interface information on non-initialized socket.
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Fri Feb 12 2021 12:50:16 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
[2021-02-12 12:50:21.567] [error] Trying to get interface information on non-initialized socket.
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Fri Feb 12 2021 12:50:43 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 12:51:15 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 8.07244, Upload: 11.461392, Ping: 39.69
[INFO][Fri Feb 12 2021 12:51:15 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 13:21:15 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 13:21:41 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 8.237024, Upload: 12.776200000000001, Ping: 20.951
[INFO][Fri Feb 12 2021 13:21:41 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 13:51:42 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 13:52:10 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 7.75488, Upload: 13.530280000000001, Ping: 43.658
[INFO][Fri Feb 12 2021 13:52:10 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 14:22:10 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
{"error":"Cannot open socket: Timeout occurred in connect."}
[INFO][Fri Feb 12 2021 14:22:28 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 14:22:50 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 5.982, Upload: 13.633928, Ping: 44.647
[INFO][Fri Feb 12 2021 14:22:50 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 14:52:50 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 14:53:13 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 12.728192, Upload: 14.131472, Ping: 23.004
[INFO][Fri Feb 12 2021 14:53:13 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...
[INFO][Fri Feb 12 2021 15:23:13 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
[INFO][Fri Feb 12 2021 15:23:36 GMT+0000 (Coordinated Universal Time)] Speedtest results - Download: 13.651544, Upload: 15.090912, Ping: 28.44
[INFO][Fri Feb 12 2021 15:23:36 GMT+0000 (Coordinated Universal Time)] Sleeping for 1800 seconds...


frdmn commented

Appreciate the report and investigation on this. I'll try to take a look and fix this as soon as I have a bit more spare time.

Of course PR's are always welcome, if someone wants to work on this.

I have the same/similar problem. It worked for the first 3 speed test results, then failed with:

[INFO][Mon Dec 12 2022 21:05:46 GMT+0000 (Coordinated Universal Time)] Starting speedtest...
Command failed with exit code 2: speedtest --accept-license --accept-gdpr -f json
[2022-12-12 21:05:51.618] [error] Configuration - Couldn't resolve host name (HostNotFoundException)
[2022-12-12 21:05:51.618] [error] Configuration - Cannot retrieve configuration document (0)
[2022-12-12 21:05:51.618] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
[2022-12-12 21:05:51.618] [error] ConfigurationError - Could not retrieve or read configuration (Configuration)
{"type":"log","timestamp":"2022-12-12T21:05:51Z","message":"Configuration - Could not retrieve or read configuration (ConfigurationError)","level":"error"}

My installation this has this in index.js:

const getSpeedMetrics = async () => {
  const args = (process.env.SPEEDTEST_SERVER) ?
    [ "--accept-license", "--accept-gdpr", "-f", "json", "--server-id=" + process.env.SPEEDTEST_SERVER] :
    [ "--accept-license", "--accept-gdpr", "-f", "json" ];

  const { stdout } = await execa("speedtest", args);  
  const result = JSON.parse(stdout);
  return {
    upload: bitToMbps(result.upload.bandwidth),
    download: bitToMbps(result.download.bandwidth),
    ping: result.ping.latency
  };
};

Should I replace it with what @mmaxbar added above ?