evryfs/github-actions-runner-operator

Runner deployment Help

Anusha-Kolli opened this issue · 21 comments

Hi,

Need help with Runner deployment. I deployed operator and when try to deploy runner pods, pods are crashlooping with error RUNNER_TOKEN variable is needed.

I already created a secret with PAT.

Thanks

It needs to be referenced like this: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L56

GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator

Here is my crd. I have referenced same as above still I am getting that error
test.pdf

Now it says secret "Error: secret "runner-pool-regtoken" not found"

What should be the value in secret "runner-pool-regtoken" ?

@Anusha-Kolli That secret will be created automatically (and refreshed) by the controller. It's a registration token for the runner. Change the reference in the crd to be named runner-pool-regtoken where you have:

envFrom:
 - secretRef:
     name: actions-runner <-- here

like in the reference and you should be good to go, given that you run a recent version.
also the lifecycle element should not be needed.

What version of the controller do you run?

@davidkarlsen I am running latest version.
And runner-pool-regtoken didn't work but I see it created runner-regtoken secret and referenced that in crd, I am able to deploy.

But the pod is crash looping. I have the runner, docker, exporter logs.

ldd: ./bin/System.Security.Cryptography.Native.OpenSsl.so: No such file or directory
ldd: ./bin/System.IO.Compression.Native.so: No such file or directory

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration



√ Runner successfully added
√ Runner connection is good

# Runner settings


√ Settings Saved.


√ Connected to GitHub

2021-01-21 04:07:12Z: Listening for Jobs ```


**But had problem with docker container logs:**

``` kubectl logs -f  runner-pod-c9kr8 -c docker  --namespace actions-runner
Must define RUNNER_TOKEN variable ```



**exporter logs:**   

```kubectl logs -f  runner-pod-c9kr8 -c exporter  --namespace actions-runner
I0121 04:07:07.473507       1 main.go:103] mtail version v3.0.0-rc36 git revision 7825f115dd3ed9f623377821c0351d1eb7aa3a5a go version go1.14.4 go arch amd64 go os linux
I0121 04:07:07.473763       1 main.go:104] Commandline: ["mtail" "-logtostderr" "-logs" "/_diag/*" "-progs" "/progs"]
I0121 04:07:07.474447       1 log_watcher.go:249] No abspath in watched list, added new one for /progs
I0121 04:07:07.474761       1 loader.go:229] Loaded program jobmetrics.mtail
I0121 04:07:07.474812       1 log_watcher.go:249] No abspath in watched list, added new one for /_diag
I0121 04:07:07.474888       1 log_watcher.go:249] No abspath in watched list, added new one for /_diag/Runner_20210121-040707-utc.log
I0121 04:07:07.474931       1 log_watcher.go:254] Found this processor in watched list
I0121 04:07:07.474983       1 log_watcher.go:254] Found this processor in watched list
I0121 04:07:07.475023       1 tail.go:315] Tailing /_diag/Runner_20210121-040707-utc.log
I0121 04:07:07.475114       1 store.go:136] Starting metric store expiry loop every 1h0m0s
I0121 04:07:07.475364       1 mtail.go:341] Listening on [::]:3903
I0121 04:07:07.475519       1 tail.go:461] Starting log handle expiry loop every 1h0m0s
I0121 04:07:10.728620       1 log_watcher.go:254] Found this processor in watched list
I0121 04:07:10.728705       1 log_watcher.go:249] No abspath in watched list, added new one for /_diag/Runner_20210121-040710-utc.log
I0121 04:07:10.728728       1 tail.go:315] Tailing /_diag/Runner_20210121-040710-utc.log```

Not sure how to proceed from here

Thank you So much Felix for the new image.

I tried using this image and still my runner pod is crashlooping with error:

Runner container logs looks fine and I am able to see runner in GitHub repo but for docker conatiner Iam getting this

kubectl logs runner-pod-pfxpf -c docker --namespace actions-runner


--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication

Http response code: NotFound from 'POST https://api.github.com/actions/runner-registration'
{"message":"Not Found","documentation_url":"https://docs.github.com/rest"}
Response status code does not indicate success: 404 (Not Found).```



Please help me to proceed from here.

Thanks

I think the runner-image is not good, if you follow the example docs then use this: https://quay.io/repository/evryfs/github-actions-runner?tab=tags which is tailored to suit the operator.

Quay is the best place to build, store, and distribute your containers. Public repositories are always free.

did that work out for you?

Hey @davidkarlsen, just chiming in here to report that we are seeing the same error as the one @Anusha-Kolli last reported.
In our case the operator worked well when we first deployed it a few months ago (using the image you've suggested in the previous comment) and broke some time ago. I'm not entirely sure why.
I also can't find any documentation about the api endpoint returning the error in Github's API reference, and since it returns a 404 error I have a slight suspicion something changed in Github's Actions API and the client should be updated. WDYT?

@yaron-idan please run latest operator version and latest runner image and it should work fine.

Thanks @davidkarlsen, I'm actually running the latest image of the runner and the latests operator image and the same error is still happening. Github support suggested this is an authentication error so I'm trying to understand if something is wrong with the PAT I'm supplying the runner, but I can't find a way to figure this out.

Here are some details of the versions I'm running, in case I've missed something -
runner image - latest (digest: @sha256:92a71e96865f4066cca8e08a7ea0ef2f5216bf164848f41d3348c8090fe3d5c9)
operator version - v0.8.3
chart version - 2.5.4

Any idea how can I further troubleshoot this issue?

Those look correct, thanks, the PAT you use in https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L17 needs to have sufficient permissions/scopes to register a runner: https://docs.github.com/en/rest/reference/actions#self-hosted-runners.

Is the runner repo scoped or org-scoped? Note that org needs to be defined in the CR in both cases.

Also - how does the logs of the operator look like?

GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator

did that work out for you?

@davidkarlsen yes

how about you @yaron-idan ?

Still having issues. I've produced a PAT with all required permissions and I'm still meeting the same error. The org is specified in the operator spec like so -

- name: runner
     env:
     - name: GH_ORG
        value: {{ $githubOrg }}

The $githubOrg variable is assigned a value from our values.yaml file earlier in the template file.

The operator throws these errors when scaling up another runner -

2021-01-28T13:29:22.230Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}
2021-01-28T13:29:22.721Z	INFO	controllers.GithubActionRunner	Scaling up	{"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi", "numInstances": 1}
2021-01-28T13:29:22.721Z	INFO	controllers.GithubActionRunner	Registration token expired, updating	{"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}
2021-01-28T13:29:29.417Z	INFO	controllers.GithubActionRunner	Creating a new Pod	{"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi", "Pod.Namespace": "github-runners", "Pod.Name": "kubernetes-500m-cores-1024mi-pod-wmf2k", "result": "created"}
2021-01-28T13:29:29.417Z	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"GithubActionRunner","namespace":"github-runners","name":"kubernetes-500m-cores-1024mi","uid":"c09508c2-0c62-4d1b-ab5f-602fc404450e","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"483312812"}, "reason": "Scaling", "message": "Created pod github-runners/kubernetes-500m-cores-1024mi-pod-wmf2k"}
2021-01-28T13:29:29.427Z	DEBUG	controller-runtime.manager.events	Warning	{"object": {"kind":"GithubActionRunner","namespace":"github-runners","name":"kubernetes-500m-cores-1024mi","uid":"c09508c2-0c62-4d1b-ab5f-602fc404450e","apiVersion":"garo.tietoevry.com/v1alpha1","resourceVersion":"483312812"}, "reason": "ProcessingError", "message": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"kubernetes-500m-cores-1024mi\": the object has been modified; please apply your changes to the latest version and try again"}
2021-01-28T13:29:29.431Z	ERROR	util.api	unable to update status	{"error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"kubernetes-500m-cores-1024mi\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
github.com/redhat-cop/operator-utils/pkg/util.(*ReconcilerBase).ManageErrorWithRequeue
	/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.1.1/pkg/util/reconciler.go:388
github.com/redhat-cop/operator-utils/pkg/util.(*ReconcilerBase).ManageOutcomeWithRequeue
	/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.1.1/pkg/util/reconciler.go:365
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).manageOutcome
	/workspace/controllers/githubactionrunner_controller.go:179
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).handleScaling
	/workspace/controllers/githubactionrunner_controller.go:135
github.com/evryfs/github-actions-runner-operator/controllers.(*GithubActionRunnerReconciler).Reconcile
	/workspace/controllers/githubactionrunner_controller.go:96
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.0/pkg/internal/controller/controller.go:293
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.0/pkg/internal/controller/controller.go:248
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99
2021-01-28T13:29:29.432Z	ERROR	controller-runtime.manager.controller.githubactionrunner	Reconciler error	{"reconciler group": "garo.tietoevry.com", "reconciler kind": "GithubActionRunner", "name": "kubernetes-500m-cores-1024mi", "namespace": "github-runners", "error": "Operation cannot be fulfilled on githubactionrunners.garo.tietoevry.com \"kubernetes-500m-cores-1024mi\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.0/pkg/internal/controller/controller.go:297
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.0/pkg/internal/controller/controller.go:248
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99
2021-01-28T13:29:29.432Z	INFO	controllers.GithubActionRunner	Reconciling GithubActionRunner	{"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}
2021-01-28T13:29:29.828Z	INFO	controllers.GithubActionRunner	Pods and runner API not in sync, returning early	{"githubactionrunner": "github-runners/kubernetes-500m-cores-1024mi"}

Any idea what's the issue here, @davidkarlsen?

Thanks for all these swift responses! We really appreciate you devoting time to our issues, and the product you've written. Can't wait for it to work properly again.

@yaron-idan those "unable to update status" are not relevant and can be ignored.

So the operator will:

  1. create/update a token in the namespace where the CR is defined called <NameOfRunner>-regtoken, this token needs to be funneled into the pod like here: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L56
  2. that in turn will be picked up by the scheduled runner pod: https://github.com/evryfs/github-actions-runner/blob/master/entrypoint.sh#L4

for the operator to be able to create/refresh that token, it is important that you have defined your PAT in the secret referenced here: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L17

for org-wide runners that's all needed.

If you however have a repo-scoped runner, you have to set the repo (and same value), both at:

  1. https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L16 and
  2. https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L54

full logs from operator and/or runners will tell us what is going on (you can refer some gists)

Hope this helps!

GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
GitHub
Contribute to evryfs/github-actions-runner development by creating an account on GitHub.
GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator
GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator

@davidkarlsen , I am able to deploy GH operator and runner in my local on docker-desktop and I am deploying runner crd using helm chart and used volume claims instead of empty directories and used subpaths. It worked fine in my local So when I try to deploy in my org kubernetes cluster docker conatiner and exporter container are running fine while runner container is crashlooping with the below error.

I sm using ubuntu20-20201210.0-2.276.1 image

ldd: ./bin/libSystem.Security.Cryptography.Native.OpenSsl.so: No such file or directory
ldd: ./bin/libSystem.IO.Compression.Native.so: No such file or directory
Unhandled exception. System.UnauthorizedAccessException: Access to the path '/home/runner/_diag/Runner_20210128-164015-utc.log' is denied.
---> System.IO.IOException: Permission denied
--- End of inner exception stack trace ---
at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
at System.IO.FileStream.OpenHandle(FileMode mode, FileShare share, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize)
at GitHub.Runner.Common.HostTraceListener.CreatePageLogWriter()
at GitHub.Runner.Common.HostTraceListener..ctor(String logFileDirectory, String logFilePrefix, Int32 pageSizeLimit, Int32 retentionDays)
at GitHub.Runner.Common.HostContext..ctor(String hostType, String logFile)
at GitHub.Runner.Listener.Program.Main(String[] args)
./config.sh: line 81: 30 Aborted (core dumped) ./bin/Runner.Listener configure "$@" 

Anyhelp would be appreciated.
Thanks

  1. create/update a token in the namespace where the CR is defined called <NameOfRunner>-regtoken, this token needs to be funneled into the pod like here: https://github.com/evryfs/github-actions-runner-operator/blob/master/config/samples/garo_v1alpha1_githubactionrunner.yaml#L56

That was it, It's working again!!!

Can't thank you enough for having the patience to troubleshoot and follow up on this with me.

GitHub
K8S operator for scheduling github actions runner pods - evryfs/github-actions-runner-operator

Great to hear 🎉
For the. NET error that was reported and fixed in the recent runner version from github: https://github.com/actions/runner/releases.
This runner sw is included in my image: https://github.com/evryfs/github-actions-runner/blob/master/Dockerfile#L3 Therefore I recommend running with the tagged versions: https://quay.io/repository/evryfs/github-actions-runner?tab=tags. If going Yolo with master/latest be sure to have imagePullPolicy Always

Happy building! ;-)

Hi @davidkarlsen @Anusha-Kolli
I'm seeing the same error Pods are crashing with error "Must define RUNNER_TOKEN variable"

I created secret
kubectl create secret generic github-runner-app --from-literal=GITHUB_APP_INTEGRATION_ID=*** --from-file=GITHUB_APP_PRIVATE_KEY=***.pem -n namespace github-actions-runner-operator

referenced this
envFrom:
- secretRef:
name: github-runner-app

I'm not using PAT and want to use GitHub App Method of authentication.

I believe this is for PAT.
tokenRef:
key: GH_TOKEN
name: actions-runner

using PAT worked fine but when testing with GitHub App by creating secrets and ref from envFrom: is failing

where do i define RUNNER_TOKEN variable?

Appreciate any kind of help here

deeco commented

Hi @davidkarlsen @Anusha-Kolli I'm seeing the same error Pods are crashing with error "Must define RUNNER_TOKEN variable"

I created secret kubectl create secret generic github-runner-app --from-literal=GITHUB_APP_INTEGRATION_ID=*** --from-file=GITHUB_APP_PRIVATE_KEY=***.pem -n namespace github-actions-runner-operator

referenced this envFrom: - secretRef: name: github-runner-app

I'm not using PAT and want to use GitHub App Method of authentication.

I believe this is for PAT. tokenRef: key: GH_TOKEN name: actions-runner

using PAT worked fine but when testing with GitHub App by creating secrets and ref from envFrom: is failing

where do i define RUNNER_TOKEN variable?

Appreciate any kind of help here

I am also facing this exact issue, do not want to use PAT tokens as user associated and people leave, cannot use service accounts or long lived tokens either

For using github app authentication, the config needs to be passed to the operator itself: https://github.com/evryfs/helm-charts/blob/master/charts/github-actions-runner-operator/values.yaml#L70 https://github.com/evryfs/helm-charts/blob/master/charts/github-actions-runner-operator/templates/deployment.yaml#L34

GitHub
OpenSourced Helm charts. Contribute to evryfs/helm-charts development by creating an account on GitHub.
GitHub
OpenSourced Helm charts. Contribute to evryfs/helm-charts development by creating an account on GitHub.