evryfs/github-actions-runner-operator

Help debugging runner deployment "Operation cannot be fulfilled..."

brian-pickens opened this issue · 5 comments

Hey @davidkarlsen,

I appreciate that you are so active in this project. Its been a very useful tool for us. Recently after trying to update my runner image which is a custom image based on your runner (just to install a few extra things), I am now unable to get a runner to start. I've verified that the image is being pulled properly. Its not a runner startup issue as the operator seems to be unable to even start a runner. The following logs is the only error I have come across. I've tried destroying the entire deployment and namespace and rebuilding everything including using my original image, but it won't spin up any new images. One recent change that may or may not be relevant is we just enabled SSO for our org but I don't know if or how that would effect a github app. I did regenerate and apply a new app certificate to rule that out.

2022-04-04T12:06:39.631Z ERROR util.api unable to update status {"error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-pool\": the object has been modified; please apply your changes to the latest version and try again"}

Here is the full log

2022-04-04T12:06:21.807Z        INFO    controller-runtime.metrics      metrics server is starting to listen    {"addr": ":8080"}
2022-04-04T12:06:21.809Z        INFO    setup   starting manager
I0404 12:06:21.810499      33 leaderelection.go:248] attempting to acquire leader lease actions-runner/4ef9cd91.tietoevry.com...
2022-04-04T12:06:21.810Z        INFO    starting metrics server {"path": "/metrics"}
I0404 12:06:21.827065      33 leaderelection.go:258] successfully acquired lease actions-runner/4ef9cd91.tietoevry.com
2022-04-04T12:06:21.827Z        DEBUG   events  Normal  {"object": {"kind":"ConfigMap","namespace":"actions-runner","name":"4ef9cd91.tietoevry.com","uid":"00b457a5-2ab3-45a8-96ca-3ac1c998feab","apiVersion":"v1","resourceVersion":"37026813"}, "reason": "LeaderElection", "message": "github-actions-runner-operator-bf669486f-xht8h_7e4e34ca-f5e4-4e68-8898-974ba7c1c90a became leader"}
2022-04-04T12:06:21.827Z        INFO    controller.githubactionrunner   Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-04-04T12:06:21.827Z        INFO    controller.githubactionrunner   Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-04-04T12:06:21.827Z        INFO    controller.githubactionrunner   Starting EventSource    {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "source": "kind source: /, Kind="}
2022-04-04T12:06:21.827Z        INFO    controller.githubactionrunner   Starting Controller     {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner"}
2022-04-04T12:06:21.929Z        INFO    controller.githubactionrunner   Starting workers        {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "worker count": 1}
2022-04-04T12:06:38.603Z        INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "actions-runner/runner-pool"}
2022-04-04T12:06:39.130Z        INFO    controllers.GithubActionRunner  Registration secret not found, creating {"githubactionrunner": "actions-runner/runner-pool"}
2022-04-04T12:06:39.371Z        INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning earl{"githubactionrunner": "actions-runner/runner-pool"}
2022-04-04T12:06:39.389Z        INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "actions-runner/runner-pool"}
2022-04-04T12:06:39.621Z        INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning earl{"githubactionrunner": "actions-runner/runner-pool"}
2022-04-04T12:06:39.631Z        ERROR   util.api        unable to update status {"error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-pool\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).manageOutcome
        /workspace/controllers/githubactionrunner_controller.go:181
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).handleScaling
        /workspace/controllers/githubactionrunner_controller.go:122
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).Reconcile
        /workspace/controllers/githubactionrunner_controller.go:97
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-04-04T12:06:39.631Z        ERROR   controller.githubactionrunner   Reconciler error        {"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "name": "runner-pool", "namespace": "actions-runner", "error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"runner-pool\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227
2022-04-04T12:06:39.637Z        INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "actions-runner/runner-pool"}
2022-04-04T12:06:39.868Z        INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning earl{"githubactionrunner": "actions-runner/runner-pool"}
2022-04-04T12:07:39.884Z        INFO    controllers.GithubActionRunner  Reconciling GithubActionRunner  {"githubactionrunner": "actions-runner/runner-pool"}

Here is the describe on my runner pool

Name:         runner-pool
Namespace:    actions-runner
Labels:       <none>
Annotations:  <none>
API Version:  garo.tietoevry.com/v1alpha1
Kind:         GithubActionRunner
Metadata:
  Creation Timestamp:  2022-04-04T12:06:38Z
  Generation:          2
  Managed Fields:
    API Version:  garo.tietoevry.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:deletionOrder:
        f:maxRunners:
        f:minRunners:
        f:minTtl:
        f:organization:
        f:podTemplateSpec:
          .:
          f:metadata:
            .:
            f:annotations:
              .:
              f:prometheus.io/port:
              f:prometheus.io/scrape:
              f:sensor.falcon-system.crowdstrike.com/injection:
          f:spec:
            .:
            f:affinity:
              .:
              f:podAntiAffinity:
                .:
                f:preferredDuringSchedulingIgnoredDuringExecution:
            f:containers:
            f:volumes:
        f:reconciliationPeriod:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2022-04-04T12:06:38Z
    API Version:  garo.tietoevry.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
          .:
          k:{"type":"ReconcileSuccess"}:
            .:
            f:lastTransitionTime:
            f:message:
            f:observedGeneration:
            f:reason:
            f:status:
            f:type:
        f:currentSize:
    Manager:         manager
    Operation:       Update
    Time:            2022-04-04T12:06:39Z
  Resource Version:  37028843
  UID:               96a6197c-6da3-438d-b424-c0104b3f21bd
Spec:
  Deletion Order:  LeastRecent
  Max Runners:     8
  Min Runners:     2
  Min Ttl:         0m
  Organization:    foo
  Pod Template Spec:
    Metadata:
      Annotations:
        prometheus.io/port:                              3903
        prometheus.io/scrape:                            true
        sensor.falcon-system.crowdstrike.com/injection:  disabled
    Spec:
      Affinity:
        Pod Anti Affinity:
          Preferred During Scheduling Ignored During Execution:
            Pod Affinity Term:
              Label Selector:
                Match Expressions:
                  Key:       foo.com/pool
                  Operator:  In
                  Values:
                    runner-pool
              Topology Key:  kubernetes.io/hostname
            Weight:          100
      Containers:
        Env:
          Name:   ACTIONS_RUNNER_INPUT_LABELS
          Value:  nonprod
          Name:   DOCKER_TLS_CERTDIR
          Value:  /certs
          Name:   DOCKER_HOST
          Value:  tcp://localhost:2376
          Name:   DOCKER_TLS_VERIFY
          Value:  1
          Name:   DOCKER_CERT_PATH
          Value:  /certs/client
          Name:   GH_ORG
          Value:  fooorg
        Env From:
          Secret Ref:
            Name:           runner-pool-regtoken
        Image:              myrepo/github-actions-runner:latest
        Image Pull Policy:  Always
        Name:               runner
        Resources:
        Volume Mounts:
          Mount Path:  /certs
          Name:        docker-certs
          Mount Path:  /home/runner/_diag
          Name:        runner-diag
          Mount Path:  /home/runner/_work
          Name:        runner-work
        Args:
          --mtu=1430
        Env:
          Name:             DOCKER_TLS_CERTDIR
          Value:            /certs
        Image:              docker:stable-dind
        Image Pull Policy:  Always
        Name:               docker
        Resources:
        Security Context:
          Privileged:  true
        Volume Mounts:
          Mount Path:  /var/lib/docker
          Name:        docker-storage
          Mount Path:  /certs
          Name:        docker-certs
          Mount Path:  /home/runner/_work
          Name:        runner-work
        Image:         quay.io/evryfs/github-actions-runner-metrics:v0.0.3
        Name:          exporter
        Ports:
          Container Port:  3903
          Protocol:        TCP
        Volume Mounts:
          Mount Path:  /_diag
          Name:        runner-diag
          Read Only:   true
      Volumes:
        Empty Dir:
        Name:  runner-work
        Empty Dir:
        Name:  runner-diag
        Empty Dir:
        Name:  mvn-repo
        Empty Dir:
        Name:  docker-storage
        Empty Dir:
        Name:             docker-certs
  Reconciliation Period:  1m
Status:
  Conditions:
    Last Transition Time:  2022-04-04T12:15:42Z
    Message:
    Observed Generation:   2
    Reason:                LastReconcileCycleSucceded
    Status:                True
    Type:                  ReconcileSuccess
  Current Size:            0
Events:                    <none>

Another thing, I have two runner operators running in different clusters. A 'prod' instance, and a 'nonprod' instance. Could it be that it is picking up the runner from the other operator? How would it differentiate between them?

the problem is this:

2022-04-04T12:06:39.868Z        INFO    controllers.GithubActionRunner  Pods and runner API not in sync, returning earl{"githubactionrunner": "actions-runner/runner-pool"}

check for broken runners in the GH UI and forcefully delete them.
AFAIR the pools you create in the two clusters need to have different names.

I appreciate the quick follow up. Here are my current runners. The first runner is from our 'prod' pool. The other three runners are just running locally from my workstation so our CI isn't completely down while I'm troubleshooting this issue. Those three extra runners don't seem to make a difference if they are there either way.

image

Well what do you know. I renamed my runner pool to something more specific and now the runners are starting up.

Closing issue since this was resolved. Feel free to open another issue if anything needs to be fixed.