Labbs/github-actions-exporter

Runner status reporting is broken

Closed this issue · 2 comments

The github runner status reporting has the following issues:

  • Data overlaps and becomes useless due to the above
  • Is not organized well for use in prometheus alerting

An example /metrics snippet from our actual environment:

github_runner_organization_status{id="720",name="github-actions-runners-tkkjp-qqscv",organization="foo",os="linux",status="offline"} 0
github_runner_organization_status{id="720",name="github-actions-runners-tkkjp-qqscv",organization="foo",os="linux",status="online"} 1

Right now, there are, effectively, two separate metrics for reporting the status of each runner. One for offline, and one for online. The client does not remove the previous metric entry when the state changes, so you end up reporting both states simultaneously, and the offline metric makes no sense because foo{status="offline"}==0 should mean that "it is not offline", which is not the case right now. And you cannot tell which metric is current.

Recommended Fix Version 1

Since there appears to be only two states, the easiest way to do this would be to instead implement a simple "runner is online" metric:

runner_online{name="{{name}}",...} = (r.status == "online")

I'm not a github expert; this won't work in and of itself if there is more than online and offline to the runner statuses.

Recommended Fix Version 2

You could report a metric for each possible runner status, which would be a bit more flexible if github implements more than "online" and "offline". And always make the value 1 for "this state is true" or 0 for "this state is false":

runner_status{name="{{name}}",...,status="offline"} = (r.status == "offline")
runner_status{name="{{name}}",...,status="online") = (r.status == "online")

Hello, you just need fix the conflict before and I merge your PR

@Labbs should be fixed now.