myoung34/docker-github-actions-runner

Cannot Checkout Code When Running jobs Inside Container

ukewea opened this issue · 3 comments

ukewea commented

Hello Team,

I created a few instances of the runner using Docker swarm. When running job inside container (which has container: defined), those job will fail to checkout code, saying that

OCI runtime exec failed: exec failed: unable to start container process: exec: "/__e/node20/bin/node": stat /__e/node20/bin/node: no such file or directory: unknown

I want to troubleshoot this problem, could you provide some insight or things I can try?

Appreciate for your hard work on this project!


Here's the full log of "Checkout code" step:

2023-09-26T03:21:30.5280980Z ##[group]Run actions/checkout@v4
2023-09-26T03:21:30.5281277Z with:
2023-09-26T03:21:30.5281552Z   repository: org_name/dotnet-cicd-job-container-eval
2023-09-26T03:21:30.5281960Z   token: ***
2023-09-26T03:21:30.5282159Z   ssh-strict: true
2023-09-26T03:21:30.5282368Z   persist-credentials: true
2023-09-26T03:21:30.5282584Z   clean: true
2023-09-26T03:21:30.5282818Z   sparse-checkout-cone-mode: true
2023-09-26T03:21:30.5283037Z   fetch-depth: 1
2023-09-26T03:21:30.5283228Z   fetch-tags: false
2023-09-26T03:21:30.5283429Z   show-progress: true
2023-09-26T03:21:30.5283617Z   lfs: false
2023-09-26T03:21:30.5283930Z   submodules: false
2023-09-26T03:21:30.5284142Z   set-safe-directory: true
2023-09-26T03:21:30.5284348Z ##[endgroup]
2023-09-26T03:21:30.5408564Z ##[command]/usr/bin/docker exec  08bbc0dfa7d1fb05b826689db7c7eea05125ba3edec5d598bdb44fcf754d1679 sh -c "cat /etc/*release | grep ^ID"
2023-09-26T03:21:30.7137042Z OCI runtime exec failed: exec failed: unable to start container process: exec: "/__e/node20/bin/node": stat /__e/node20/bin/node: no such file or directory: unknown

Here's my docker-compose.yml used to deploy multiple runner instance:

version: '3'

services:
  runner:
    image: myoung34/github-runner:latest
    environment:
      ACCESS_TOKEN: 'ghp_****************************'
      RUNNER_SCOPE: 'org'
      ORG_NAME: 'org_name'
      RUNNER_GROUP: 'runner-farm' 	  
      RUNNER_NAME: 'containerized_{{.Task.Slot}}'
      RUNNER_WORKDIR: '/tmp/runner/work_{{.Task.Slot}}'
      EPHEMERAL: '1'
    volumes:
      - '/var/run/docker.sock:/var/run/docker.sock'
      # - '/tmp/runner:/tmp/runner'
    deploy:
      replicas: 3

Here's the workflow of the CI job:

name: .NET Core CI/CD in Containerized Environment

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: self-hosted

    container:
      image: mcr.microsoft.com/dotnet/sdk:6.0

    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Restore dependencies
      run: dotnet restore

    - name: Build solution
      run: dotnet build --configuration Release --no-restore

    - name: Run unit tests
      run: dotnet test --no-restore --verbosity normal

This is a question for the upstream actions team, this repo is only concerned with installing the runner itself

ukewea commented

Thanks for the hint, after some digging, it is the issue of job container creation.

In the "Initialize containers > Starting job container" step, by looking at the command that used to start a new container, we can notice that it mapped /actions-runner/externals on my host machine to /__e in the containers that runs the job, but /actions-runner/externals only exists in the container that the runner was in, not on the host machine.

image

Raw text:

/usr/bin/docker create --name 7954e543b87540929d1873e3be523ef2_******dotnet_sdk_nodejs6020x_c0b4e8 --label c3f261 --workdir /__w/dotnet-cicd-eval/dotnet-cicd-eval --network github_network_69a556d20d5b4b48a1955f3b83e167b5  -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/tmp/runner/work_2":"/__w" -v "/actions-runner/externals":"/__e":ro -v "/tmp/runner/work_2/_temp":"/__w/_temp" -v "/tmp/runner/work_2/_actions":"/__w/_actions" -v "/opt/hostedtoolcache":"/__t" -v "/tmp/runner/work_2/_temp/_github_home":"/github/home" -v "/tmp/runner/work_2/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" ******/dotnet_sdk_nodejs:6.0-20.x "-f" "/dev/null"

I have a possible workaround: looks like it is possible to override the path of volumes to be mapped into job containers by creating a json file. Maybe I can give it a try and report in the next few days.

@ukewea great discovery!! I too was having the same issue checking out code when using containers for my jobs. However, my setup is within Kubernetes. My pod has two containers, the runner and the docker in docker container used to run all docker commands/actions. The underlying file structure is shared through a set of volumes. Sure enough, my docker in docker container did not have the directory /actions-runner/ as you discovered above. I was able to use an initContainer to copy the data from the /actions-runner/ dir on the image to my shared volume. Once both the containers spun up, they each had access to the entirety of the /actions-runner/ directory. Confirmed code checkout then worked without a hitch 👍

initContainers:
  - name: configure-actions-runner-dir
    image: myoung34/github-runner:latest
    command: ["/bin/bash", "-c"]
    args:
      - |
        cp -R /actions-runner/* /actions-runner-volume/
    securityContext:
      runAsUser: 0
    volumeMounts:
      - mountPath: /actions-runner-volume/
        name: actions-runner