evryfs/github-actions-runner-operator

Finalization of pods not run when CR is deleted

aroemen opened this issue · 17 comments

Running kubectl apply -f .\gh-runners-linux.yaml creates the runners as expected in my GitHub organization. When I delete them though (using kubectl delete -f .\gh-runners-linux.yaml), the pods that contained the runners get stuck in a "Terminating" status.

NAMESPACE                        NAME                                              READY   STATUS        RESTARTS   AGE
github-action-runners            runner-pool-pod-fhthp                             0/3     Terminating   0          4m50s
github-action-runners            runner-pool-pod-wfs62                             0/3     Terminating   0          4m50s
github-actions-runner-operator   github-actions-runner-operator-59b9d486b5-t2p62   1/1     Running       0          5m26s

If I edit the pod and remove the finalizer (garo.tietoevry.com/runner-registration), the pod successfully deletes after saving that change. The runner is not being removed from my list of GitHub self hosted runners though as I would expect. Am I missing something here?

Then there is a problem with unregistration, please provide logs from the operator to enable me to help you.

@davidkarlsen I don't see any mention of the delete in the operator log. The delete command was issued at 12:34:23 which is the last time there is anything in the operator logs here:

2021-03-11T18:30:34.050Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	controller-runtime.injectors-warning	Injectors are deprecated, and will be removed in v0.10.x
2021-03-11T18:30:34.051Z	INFO	setup	starting manager
I0311 18:30:34.052860       1 leaderelection.go:243] attempting to acquire leader lease github-actions-runner-operator/4ef9cd91.tietoevry.com...
2021-03-11T18:30:34.052Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
I0311 18:30:51.471375       1 leaderelection.go:253] successfully acquired lease github-actions-runner-operator/4ef9cd91.tietoevry.com
2021-03-11T18:30:51.471Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"ConfigMap","namespace":"github-actions-runner-operator","name":"4ef9cd91.tietoevry.com","uid":"830a98c7-1d79-4fd4-8b16-27048338c333","apiVersion":"v1","resourceVersion":"156761"}, "reason": "LeaderElection", "message": "github-actions-runner-operator-59b9d486b5-hbsrz_a1bc3d27-328e-490c-86e3-4e6033887fbf became leader"}
2021-03-11T18:30:51.472Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.573Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.674Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting EventSource	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2021-03-11T18:30:51.775Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting Controller	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner"}
2021-03-11T18:30:51.775Z	INFO	controller-runtime.manager.controller.githubactionrunner	Starting workers	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "worker count": 1}
2021-03-11T18:30:51.775Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:30:52.172Z	INFO	controllers.GithubActionRunner	Scaling up	{"githubactionrunner": "github-action-runners/runner-pool", "numInstances": 2}
2021-03-11T18:30:52.182Z	INFO	controllers.GithubActionRunner	Creating a new Pod	{"githubactionrunner": "github-action-runners/runner-pool", "Pod.Namespace": "github-action-runners", "Pod.Name": "runner-pool-pod-4ts8j", "result": "created"}
2021-03-11T18:30:52.182Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GithubActionRunner","namespace":"github-action-runners","name":"runner-pool","uid":"377cc688-b76c-4862-b268-3e306e2dc484","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"156732"}, "reason": "Scaling", "message": "Created pod github-action-runners/runner-pool-pod-4ts8j"}
2021-03-11T18:30:52.186Z	INFO	controllers.GithubActionRunner	Creating a new Pod	{"githubactionrunner": "github-action-runners/runner-pool", "Pod.Namespace": "github-action-runners", "Pod.Name": "runner-pool-pod-779pp", "result": "created"}
2021-03-11T18:30:52.186Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GithubActionRunner","namespace":"github-action-runners","name":"runner-pool","uid":"377cc688-b76c-4862-b268-3e306e2dc484","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"156732"}, "reason": "Scaling", "message": "Created pod github-action-runners/runner-pool-pod-779pp"}
2021-03-11T18:30:52.256Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:30:52.401Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:31:52.256Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:32:52.502Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:33:52.687Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:22.734Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:23.141Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}
2021-03-11T18:34:52.876Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-action-runners/runner-pool"}

Sorry, I just noticed I put this on the wrong project. This should probably be on the github-actions-runner-operator project than here. Let me know if you want me to move it.

that's strange, what version are you running of the operator?
can you provide the CR for the runner pool?

I'm running the latest version from helm charts 2.5.10. I'm just testing locally in my k8s environment in docker on win10.

apiVersion: garo.tietoevry.com/v1alpha1
kind: GithubActionRunner
metadata:
  name: runner-pool
  namespace: github-action-runners
spec:
  minRunners: 2                # minimum running pods, required
  maxRunners: 6                # max number of pods, required
  reconciliationPeriod: 1m     # How often it will reconcile, optional, default 1m
  organization: MYORG  # the github org, required
  # repository: "theRepoName"  # if runner for repo, optional
  tokenRef:
    key: GH_TOKEN
    name: actions-runner
  podTemplateSpec:
    metadata:
      annotations:
        "prometheus.io/scrape": "true"
        "prometheus.io/port": "3903"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: garo.tietoevry.com/pool
                      operator: In
                      values:
                        - runner-pool
      containers:
        - name: runner
          env:
            - name: RUNNER_DEBUG
              value: "true"
            - name: DOCKER_TLS_CERTDIR
              value: /certs
            - name: DOCKER_HOST
              value: tcp://localhost:2376
            - name: DOCKER_TLS_VERIFY
              value: "1"
            - name: DOCKER_CERT_PATH
              value: /certs/client
            - name: ACTIONS_RUNNER_INPUT_LABELS
              value: linux,x64
            - name: ACTIONS_RUNNER_INPUT_RUNNERGROUP
              value: "Internal"
            - name: GH_ORG
              value: MYORG
            # if runner for repo:
            # - name: GH_REPO
            #   value: theRepoName
          envFrom:
            - secretRef:
                name: runner-pool-regtoken
          # find the fixed-in-time tags at https://quay.io/repository/evryfs/github-actions-runner?tab=tags if you want to avoid pulling on a moving tag
          # due to https://github.com/actions/runner/issues/246 the runner sw needs to be recent
          # you can subscribe to release-feeds at https://github.com/evryfs/github-actions-runner/releases.atom
          image: quay.io/evryfs/github-actions-runner:latest
          imagePullPolicy: Always
          resources: {}
          volumeMounts:
            - mountPath: /certs
              name: docker-certs
            - mountPath: /home/runner/_diag
              name: runner-diag
            - mountPath: /home/runner/_work
              name: runner-work
            # - mountPath: /home/runner/.m2
            #   name: mvn-repo
            # - mountPath: /home/runner/.m2/settings.xml
            #   name: settings-xml
        - name: docker
          env:
            - name: DOCKER_TLS_CERTDIR
              value: /certs
          image: docker:stable-dind
          imagePullPolicy: Always
          args:
            # See linked issues from: https://github.com/evryfs/github-actions-runner-operator/issues/39
            - --mtu=1430
          resources: {}
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /var/lib/docker
              name: docker-storage
            - mountPath: /certs
              name: docker-certs
            - mountPath: /home/runner/_work
              name: runner-work
        - name: exporter
          image: quay.io/evryfs/github-actions-runner-metrics:v0.0.3
          ports:
            - containerPort: 3903
              protocol: TCP
          volumeMounts:
            - name: runner-diag
              mountPath: /_diag
              readOnly: true
      volumes:
        - emptyDir: {}
          name: runner-work
        - emptyDir: {}
          name: runner-diag
        - emptyDir: {}
          name: mvn-repo
        - emptyDir: {}
          name: docker-storage
        - emptyDir: {}
          name: docker-certs
        # - configMap:
        #     defaultMode: 420
        #     name: settings-xml
        #   name: settings-xml

I was able to reproduce it. It's an edge case when you delete the actual cr. In this case it's gone and the cleanup step handling the finalization https://github.com/evryfs/github-actions-runner-operator/blob/master/controllers/githubactionrunner_controller.go#L116 is not reached.

GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator

What would be another way to tear down these resources then?

Hi there,
I have the same issue here

NAME                    READY   STATUS        RESTARTS   AGE
runner-pool-pod-7qhqc   0/3     Terminating   0          4d6h
runner-pool-pod-d96bw   0/3     Terminating   0          4h38m
runner-pool-pod-w278v   0/3     Terminating   0          4h38m
runner-pool-pod-xbmww   0/3     Terminating   0          4h47m

I can't remove them.
Thank you.

@aroemen @duyhenryer I was able to delete them by removing the finalizers field. Patch the finalizers list to be null:

kubectl patch pod <POD_NAME> -n <NAMESPACE> -p '{"metadata":{"finalizers":null}}'

yes, and that's what the operator does after de-registering them from github - which is why I am curious what the operator logs.

@davidkarlsen I posted the operator logs back in March. Do you need additional data?

@aroemen sorry, commented on the wrong issue, I was thinking of #232 which was fixed recently. Still need this to fix this one (deleting CR)

@aroemen #264 will solve this, as you can scale the pool to zero, then delete the CR.

zhsj commented

Maybe the CR should have finalizer as well.

I'm trying to make this work on latest build but cant seem to make it...
$ kubectl patch githubactionrunners.garo.tietoevry.com runner-pool --namespace actions-runner --patch '{"spec":{"minRunners":0}}' --type=merge
Results in
The GithubActionRunner "runner-pool" is invalid: spec.minRunners: Invalid value: 0: spec.minRunners in body should be greater than or equal to 1

I suspect that either the image i'm pulling is not the latest - or i'm pulling the image wrong, the operator image i'm pulling using the published helm charts :
helm upgrade --install --wait github-actions-runner-operator evryfs-oss/github-actions-runner-operator --namespace actions-runner-operator --set githubapp.existingSecret=github-runner-app --set githubapp.enabled=true

The runner image is this one :
quay.io/evryfs/github-actions-runner:latest

What am i missing ?

Thx
Tony