myoung34/docker-github-actions-runner

No space left on device

basz opened this issue · 6 comments

basz commented

Hi,

I run these in a docker compose way on several repositories. I regularly encounter jobs failing due to;

No space left on device : '/actions-runner/_diag/Worker_20230918-091841-utc.log'

When this happens I need to delete the container and start over. After some time it happens again.

I don't understand this because i have EPHEMERAL set to true and I would assume the container is reset.

docker-compose.yaml

runner-plhw:
        image: 'myoung34/github-runner:ubuntu-jammy'
        deploy:
          replicas: 2
        # command: config.cmd remove
        environment:
            - ACCESS_TOKEN=${GITHUB_TOKEN}
            - EPHEMERAL=true
            - RUNNER_NAME_PREFIX=plhw
            - RUNNER_SCOPE=org
            - DISABLE_AUTO_UPDATE=true
            - ORG_NAME=plhw
        volumes:
            - '/var/run/docker.sock:/var/run/docker.sock'
        restart: always

Looking at that log file I got the idea that mayby the '/_work/' directory should be mounted to a host directory. I have enough space. Would that solve it? Is that safe to do?

edit: tried to do a bind mount to /_work. that didn't work. permission denied errors for executables.

edit2: I am guessing point three of https://github.com/myoung34/docker-github-actions-runner/wiki/Usage#ephemeral-runners isn't achievable with docker compose up -d. Containers are restarted but not deleted when finished.

You have a few questions here

Ephemeral mode is documented here

tried to do a bind mount to /_work. that didn't work. permission denied errors for executables.

You'll need to resolve your permissions, but I dont know what issue youre having specifically

No space left on device : '/actions-runner/_diag/Worker_20230918-091841-utc.log'
When this happens I need to delete the container and start over. After some time it happens again.

If youre not using volumes or configuring anything, docker has a default storage size for containers set to 10gb or 100gb depending on your versions

My guess for your scenario is to see if ephemeral mode is actually working as expected
Does it continue to live after processing a job?
Do the start up logs for the container output Ephemeral option is enabled as shown in the following output?

✗ docker run --rm -it -e EPHEMERAL=true myoung34/github-runner:ubuntu-bionic
REPO_URL required for repo runners
Runner reusage is disabled
Ephemeral option is enabled
Configuring
basz commented
2023-09-19 12:34:09 2023-09-19 10:34:09Z: Running job: run-tests
2023-09-19 12:36:57 2023-09-19 10:36:57Z: Job run-tests completed with result: Succeeded
2023-09-19 12:36:58 √ Removed .credentials
2023-09-19 12:36:58 √ Removed .runner
2023-09-19 12:36:59 Runner reusage is disabled
2023-09-19 12:36:59 Obtaining the token of the runner
2023-09-19 12:36:59 Ephemeral option is enabled
2023-09-19 12:36:59 Disable auto update option is enabled
2023-09-19 12:36:59 Configuring
2023-09-19 12:36:59 
2023-09-19 12:36:59 --------------------------------------------------------------------------------
2023-09-19 12:36:59 |        ____ _ _   _   _       _          _        _   _                      |
2023-09-19 12:36:59 |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
2023-09-19 12:36:59 |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
2023-09-19 12:36:59 |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
2023-09-19 12:36:59 |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
2023-09-19 12:36:59 |                                                                              |
2023-09-19 12:36:59 |                       Self-hosted runner registration                        |
2023-09-19 12:36:59 |                                                                              |
2023-09-19 12:36:59 --------------------------------------------------------------------------------
2023-09-19 12:36:59 
2023-09-19 12:36:59 # Authentication
2023-09-19 12:36:59 
2023-09-19 12:37:01 
2023-09-19 12:37:01 √ Connected to GitHub
2023-09-19 12:37:01 
2023-09-19 12:37:01 # Runner Registration
2023-09-19 12:37:01 
2023-09-19 12:37:01 
2023-09-19 12:37:01 
2023-09-19 12:37:01 
2023-09-19 12:37:01 √ Runner successfully added
2023-09-19 12:37:02 √ Runner connection is good
2023-09-19 12:37:02 
2023-09-19 12:37:02 # Runner settings
2023-09-19 12:37:02 
2023-09-19 12:37:02 
2023-09-19 12:37:02 √ Settings Saved.
2023-09-19 12:37:02 
2023-09-19 12:37:04 
2023-09-19 12:37:04 √ Connected to GitHub
2023-09-19 12:37:04 
2023-09-19 12:37:04 Current runner version: '2.308.0'
2023-09-19 12:37:04 2023-09-19 10:37:04Z: Listening for Jobs

Perhaps the container should clear /_work when doing '√ Removed .runner' as the container is not removed in a docker compose context?

No that's dockers job to clear containers
Everything here is correct, you'll need to inspect the size of the container when it grows too large and gets killed, but I suspect it's growing too large and hitting the maximum container size

Does it exit when it finishes a job?

basz commented

yes, according to the docker dashboard last the started column indicates a fresh start...

but I believe you might be missing something.

When running containers as docker compose containers are not deleted after they exit. There is no --rm option in play. The restart: always option will restart it, but then it will not be a new (empty) container. And since the container keeps writing data to '/_work/random-directory' that keeps growing and it will hit the max container size...

It's not the purpose of this container to try to deal with orchestrator edge cases

You could resolve this by using docker-compose rm on a cron or using a mounted volume and clearing data for runners that are no longer running

The container here is only concerned with running and stopping when expected

basz commented

aha, i see. thanks.
i've written a script that helps me get beyond this.