evryfs/github-actions-runner-operator

Runner pool crashes operator

Closed this issue · 2 comments

Not sure what the issue is, everything has been solid up to this point; but today I went to re-deploy my runner pool with increased resources and now every time I deploy the runner pool it crashes the operator with this message in the operator logs:

ERROR	util.api	unable to update status	{"error": "context canceled"}
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).manageOutcome
	/workspace/controllers/githubactionrunner_controller.go:181
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).handleScaling
	/workspace/controllers/githubactionrunner_controller.go:111
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).Reconcile
	/workspace/controllers/githubactionrunner_controller.go:97
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-06-28T23:17:36.742Z	ERROR	controller.githubactionrunner	Reconciler error	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "name": "runner-pool", "namespace": "runner-pool", "error": "context canceled"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-06-28T23:17:36.743Z	INFO	controller.githubactionrunner	All workers finished	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner"}

I have tried moving the runner pool to a different namespace that the operator is not watching and the issue persists where the operator deployment crashes.

I have also tried applying the latest CRD hoping maybe there was something updated, but that did not work either.

Any insight would be very helpful; I am going to poke around at the controller code to see if I can figure this out.

It seems like the controller cannot reach the internal k8s api.

GitHub had some API issues this morning, which leads me to believe this may have started last night with the above error. Closing this issue as it was not related to the operator.