Add status to operator when it has trouble talking to GitHub
bgolding355 opened this issue · 2 comments
The Problem
GitHub Outage
Within the last day github had a significant outage: https://www.githubstatus.com/incidents/sksd097hm0y5?utm_ts=1647526099
While this outage occurred, github-actions-runner-operator
experienced errors communicating with github.
While trying to debug this, I visited https://github.com/settings/tokens and tried to generate a token, the result of which was:
Runner Pod Logs
Http response code: InternalServerError from 'POST https://api.github.com/actions/runner-registration'
Operator Pod Logs
ERROR controller.githubactionrunner Reconciler error
{
"reconciler group": "garo.tietoevry.com",
"reconciler kind": "GithubActionRunner",
"name": "basic-runner-pool",
"namespace": "github",
"error": "POST https://api.github.com/orgs/GarnerCorp/actions/runners/registration-token: 500 []"
}
DEBUG events Warning
{
"object": {
"kind": "GithubActionRunner",
"namespace": "e2e-tests",
"name": "e2e-runner-pool",
"uid": "c855cc0e-a161-4673-b7ae-4c4e316f00bf",
"apiVersion": "garo.tietoevry.com/v1alpha1",
"resourceVersion": "533367216"
},
"reason": "ProcessingError",
"message": "failed to get installation for owner \"GarnerCorp\": GET https://api.github.com/orgs/GarnerCorp/installation: 500 []"
}
Environment
I am using:
dependencies:
- name: github-actions-runner-operator
version: 2.5.5
repository: https://evryfs.github.io/helm-charts/
I have since upgraded to 2.7.0 but the behavior persists.
Proposed Enhancements
When a github API call fails, it would be very useful to add a status to the GithubActionRunner
saying that it is having an issue, especially in the case where it is a reconciler error
Thank you for the thorough report. This should already be the case. Example from our cluster:
k describe gar dts-default-pool|tail -27
Reconciliation Period: 30s
Status:
Conditions:
Last Transition Time: 2022-03-17T20:19:30Z
Message:
Observed Generation: 50
Reason: LastReconcileCycleSucceded
Status: True
Type: ReconcileSuccess
Last Transition Time: 2022-03-17T15:22:43Z
Message: failed to get installation for owner "<redacted>": GET https://api.github.com/orgs/<redacted>/installation: 500 []
Observed Generation: 50
Reason: LastReconcileCycleFailed
Status: True
Type: ReconcileError
Current Size: 1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scaling 151m GithubActionRunner Created pod garo-default-runner-pool/dts-default-pool-pod-cxdh9
Normal Scaling 150m GithubActionRunner garo-default-runner-pool/dts-default-pool-pod-52qvs
Normal Scaling 147m GithubActionRunner Created pod garo-default-runner-pool/dts-default-pool-pod-txgtr
Normal Scaling 146m GithubActionRunner Created pod garo-default-runner-pool/dts-default-pool-pod-r8gdk
Normal Scaling 146m GithubActionRunner Created pod garo-default-runner-pool/dts-default-pool-pod-7mnmv
Normal Scaling 120m GithubActionRunner garo-default-runner-pool/dts-default-pool-pod-cxdh9
Normal Scaling 114m GithubActionRunner garo-default-runner-pool/dts-default-pool-pod-txgtr
Normal Scaling 113m GithubActionRunner garo-default-runner-pool/dts-default-pool-pod-r8gdk
do you have the latest CRD applied in the cluster?
Closing since no activity and this is already supported. Feel free to re-open should there be anything else.