evryfs/github-actions-runner-operator

Multiple runner operators on different GKE clusters

Opened this issue · 1 comments

We have multiple environments set up for development/staging work and I am trying to migrate our runner operator and runner pool to a new GKE cluster. Currently our development ecosystem (on GKE) is working as expected with the runner operator scheduling pods as new jobs come up.

This is all configured on the same GH org.

I am trying to migrate to a new cluster using the same GitHub app as a K8s secrets in the new cluster (assuming the GH app can be reused). I have installed the runner operator via Helm and have supplied the GitHub app secrets in the values file. The operator installs with no issues observed in the logs.

When I install the runner pool on the new cluster it shows ReconcileSuccess however, the Current size is always 0;

I have tried:

  • updating the CRD
  • setting minRunners to 5
  • Nothing seems to work --- even tried this on a freshly built cluster and am seeing the same thing (ReconcileSuccess, Current size: 0)

The runner operator logs do not give any indication of why I am not seeing any runners, everything appears to be working:

2022-07-06T14:58:57.622Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
2022-07-06T14:58:57.622Z	INFO	setup	starting manager
I0706 14:58:57.622640       1 leaderelection.go:248] attempting to acquire leader lease runner-operator/4ef9cd91.tietoevry.com...
2022-07-06T14:58:57.622Z	INFO	starting metrics server	{"path": "/metrics"}
I0706 14:59:13.859538       1 leaderelection.go:258] successfully acquired lease runner-operator/4ef9cd91.tietoevry.com
2022-07-06T14:59:13.859Z	DEBUG	events	Normal	{"object": {"kind":"ConfigMap","namespace":"runner-operator","name":"4ef9cd91.tietoevry.com","uid":"33346577-5c1c-4d78-82b4-79d1e191147b","apiVersion":"v1","resourceVersion":"14561"}, "reason": "LeaderElection", "message": "github-actions-runner-operator-fd84696f-2l2x2_2f916b5b-f9e3-4982-85f8-501181d79b2d became leader"}
2022-07-06T14:59:13.859Z	INFO	controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-07-06T14:59:13.859Z	INFO	controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-07-06T14:59:13.859Z	INFO	controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-07-06T14:59:13.859Z	INFO	controller.githubactionrunner	Starting Controller	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner"}
2022-07-06T14:59:13.961Z	INFO	controller.githubactionrunner	Starting workers	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "worker count": 1}
2022-07-06T14:59:19.414Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:19.760Z	INFO	controllers.GithubActionRunner	Registration secret not found, creating	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:19.976Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:20.136Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:20.325Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:50.136Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T14:59:50.316Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:20.367Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:20.585Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:50.604Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "runner-operator/runner-pool"}
2022-07-06T15:00:50.863Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "runner-operator/runner-pool"}

Thank you for any pointers you can provide.

I believe I finally figured this out, in my runner spec yaml file, I changed the name of the runner pool and am now seeing runner pods in my new cluster. Previously named runner-pool.

apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
  name: runner-pool-test-01
  namespace: runner-operator
spec:
  minRunners: 2
  maxRunners: 20
  organization: myOrgo
  reconciliationPeriod: 30s
  podTemplateSpec:
    metadata:
      annotations:
        "prometheus.io/scrape": "true"
        "prometheus.io/port": "3903"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: garo.tietoevry.com/pool
                      operator: In
                      values:
                        - runner-pool-test-01

Even if the operator is on a completely different cluster, using the same runner pool name in the runner spec results in zero runner pods getting created.

Would it be useful to update the README with something mentioning using the operator/runner pool across multiple clusters/lifecycles/env's?