Self-hosted runner with Docker step creates files that trip up the checkout step
j3parker opened this issue Β· 61 comments
Describe the bug
When using self-hosted runners, git checkouts are cached between runs (this is nice, because it greatly speeds up our builds).
However, if a docker-based step writes a file to the workspace it will (possibly) be owned by the root user. If the permissions don't give the (non-root) action runner user +w permission then a checkout step in a future workflow run will fail to remove this file. The first time, the error will look like this:
##[group]Cleaning the repository
[command]/usr/bin/git clean -ffdx
warning: could not open directory 'foo/': Permission denied
warning: failed to remove foo/: Directory not empty
##[endgroup]
##[warning]Unable to clean or reset the repository. The repository will be recreated instead.
Deleting the contents of '/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro'
##[error]Command failed: rm -rf "/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro/foo"
rm: cannot remove '/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro/foo': Permission denied
So git clean -ffdx
tried to stat()
this foo/
directory (created via a container in a previous build) but failed. It was then unable to remove the directory because it wasn't empty. It tried to fall back to rm -rf
which failed for the same reasons.
In future builds it goes straight to rm -rf
because the .git
folder did get cleaned up. It continues to fail in the same way for all future builds. Here's a screenshot:
To Reproduce
I've created a repo that reproduces the error: https://github.com/Brightspace/self-hosted-runner-permissions-issue-repro
Here's an example of a workflow failing: https://github.com/Brightspace/self-hosted-runner-permissions-issue-repro/runs/596011452?check_suite_focus=true
Expected behavior
I guess I'd expect all the files to be owned by the runner user... in a perfect world. Maybe that can be done with user namespace maps? Documentation. Not sure what that would entail though or if it makes sense for what the runner is doing.
I think this is not an issue with the checkout action because I don't think there is anything they could do about it - it'd impact other actions too, checkout was just the first one I hit the issue with.
Runner Version and Platform
Ubuntu 18.04, runner version 2.168.0
These are org-level runners but I imagine it's not specific to that.
@j3parker thanks for reporting this. I am also facing the same issue.
@TingluoHuang I thought to start the runner as root but the utility run.sh
has a check to not start runner as root. https://github.com/actions/runner/blob/master/src/Misc/layoutroot/run.sh#L3
# Validate not sudo
user_id=`id -u`
if [ $user_id -eq 0 -a -z "$RUNNER_ALLOW_RUNASROOT" ]; then
echo "Must not run interactively with sudo"
exit 1
fi
EDIT : @j3parker I think you can simply achieve this by exporting the variable RUNNER_ALLOW_RUNASROOT=1
. Check here.
Thank you @TingluoHuang
If you are going to use container based actions I would recommend you use a job container as well. Mixing container and host environments does not work very well. The other option is to add a step to change permissions of files after the container action runs.
Also, I don't think we can say must run runner as root since many folks run as systemd service and I don't think that will allow us to run as root.
@karancode the reason that's an envvar is there's many scenarios (service) where it won't work as the service ends up running as a user (the user configured or one specified). but there are some scenarios (like putting the runner in a container) where you want / need to explicitly run as root and that's why the override is there. We should formalize the run as root more and doc it better.
We need to keep this open and think a bit more on what the right solution is. If you want to do run as root for now and you don't run the runner as a systemd service, then make sure however you launch it, it calls runsvc.sh and not run.sh so it doesn't exit on updates.
@chrispat @bryanmacfarlane Thanks.
In my case, I am running runner inside containers, so I had no other option but to run it as root.. (I tried to change permissions of files after container actions but that way isn't feasible as there could be many different actions generating multiple files.. I also tried running it as a systemd service but it was just uglier).
I also faced the issue with run.sh exiting on updates(observed with minor
version update but not with patch
).
Thanks for the tip, I will check for runsvc.sh
and how to use it.
PS : If there's any doc please share, would love you contrib. Thanks!
If you are going to use container based actions I would recommend you use a job container as well. Mixing container and host environments does not work very well.
Cool - when you say job containers are you referring to this? I hadn't seen that before... I'll definitely try that!
We need to keep this open and think a bit more on what the right solution is.
Thanks! It looks like container:
will mitigate this for us for now. If the runner itself messed around with user namespaces that might be able to solve this (they can be nested) but it might be a bunch of work...
If container:
works for us we'd be interested in being able to configure our runners to only accept containerized jobs. I can open a feature request for this after testing it out.
Another option here is that when running inside a container that has the user as root
, you can use the jobs.<jobid>.container.options
directive to provide the --user uid:gid
value of the user and group that the self-hosted action runner is running as.
The downside to this is that details of the actions runner environment is starting to leak into the workflows of your projects, which is less than ideal in larger companies.
This is hitting me as well, at first it was easier to avoid, but now that we have more and more actions it is getting harder.
Is there a way to have the runner clean up the work directory after the workflow is finished. If the runner isnβt running as root, then it probably canβt delete the directory. But it would be able to do that if it was running in a container.
Maybe there is a way, to turn on a clean up job that will run after the workflow is complete to delete the files. If that job is run in a container it should have access to delete everything.
This only works if you have Docker installed, but that is fine because I donβt think the issue happens without docker.
I guess I could add this to all of my workflows, but that doesnβt seem ideal, getting it added at the runner level makes things cleaner. Just an idea, not sure if it is a good one, but figured I would add it here and see what people think.
For all my private actions, I ended up putting USER 1000:1000
in all the Docker actions. Since there is only ever root
and the user I created, the only valid options are 0
and 1000
. That way, any files created within the Docker action that are persisted by the self-hosted runner, they have the correct user.
The other solution is to just run the runner as root by setting RUNNER_ALLOW_RUNASROOT=1
.
in your workflow file use this, e.g:
build:
runs-on: self-hosted
needs: [clone-repository]
container:
image: gradle:5.5.1-jdk11
options: --user 1000
...
...
I suppose that will only work if you're using container
vs a Docker action.
I don't believe an action has those concepts unless specifically written by the action. In which case, it would need to get updated in every action that uses Docker (extreme example). Unless there is a universal environment variable that masks files or sets file creation permissions. (I suppose I'm thinking something similar to UMASK
here; not sure really.)
Yes for Docker actions (private or public) - User will need to modify the corresponding Dockerfile
by indeed having USER 1000:1000
like you mentioned earlier.
The nice thing is that github actions
are indirectly enforcing good practices - i.e do not run a Container as root.
I personally think that it is the user responsibility to set the right permissions there.
Concerning container
to avoid the options: --user 1000
repetition, it would be nice to be able to define it like this:
defaults:
runs-on: self-hosted
container-user-id: 1000
I found this temporary workaround for Docker action: Fix for self-hosted runner
P.S: I updated script and moved it to repository README.md
You might not be able to run with sudo, but you can add user to the root
group. That worked for me :)
sudo usermod -a -G root <USER_NAME>
You might not be able to run with sudo, but you can add user to the
root
group. That worked for me :)
sudo usermod -a -G root <USER_NAME>
Strange... I did this before and still gave me problems. I can try again in the future.
Thanks!
@jef I rewrote my script. Now it is possible to set docker's user option via env vars. You can find more details here: https://github.com/xanantis/docker-file-ownership-fix#to-fix-all-those-above-for-self-hosted-runner
Hey! I faced with the same proble. Have you guys found easy solution?
Hey! I faced with the same proble. Have you guys found easy solution?
This worked for me...
I had to remove manually the conflicted files, and then adding the user ID of the user we use for automation did the trick.
Setting the uid
is not always possible depending upon the container, how it was built and internal permissions inside the container. It is a good approach most of the time, but there are edge cases and in some use cases where there are separate teams managing the runners and the workflows complications can arise.
I have created an action that can "reset" permissions on the workspace directories and files that would trip up a consecutive run and break at checkout; https://github.com/peter-murray/reset-workspace-ownership-action
It is still not a perfect solution can can be appended to the end of the workflow with minimal impact and overhead to the workflow until there is a longer term fix available.
Also, I don't think we can say must run runner as root since many folks run as systemd service and I don't think that will allow us to run as root.
Ran into this myself today due to running python in docker containers during a test step, creating root-owned pycache files. These files then break the next build when the runner attempts removal during the checkout step like OP.
Are there any issues with running the service as root utilizing the [user] param for install? I'm hosting a runner on ubuntu 20.04.1 and running:
sudo ./svc.sh install root
sudo ./svc.sh start
works well for me as a workaround for now.
+1. This has also come up where users are using github/codeql-action
(for Code Scanning) or other Actions that write to runner.temp
. In that case it's possible for the Action to write data to the temp directory that fails to be cleaned up at the start of a next run, because the container user doesn't have permission to delete it. Documenting the right practice of getting the users to match would be a good way to help identify and prevent this.
+1 and the documentation for actions explicitly state that containers should be run as root: https://docs.github.com/en/free-pro-team@latest/actions/creating-actions/dockerfile-support-for-github-actions . Yet if you follow this, the builds will break, unless you're running they GitHub runner itself as root I suppose.
Stating that containers should be run as root is unbelievable insecure and lazy and violates the entire concept of process separation. If I don't carefully audit all thirdparty actions I have a strong possibility of opening my network for unknown damage. That's simply unacceptable under any circumstances.
Setting the
uid
is not always possible depending upon the container, how it was built and internal permissions inside the container. It is a good approach most of the time, but there are edge cases and in some use cases where there are separate teams managing the runners and the workflows complications can arise.I have created an action that can "reset" permissions on the workspace directories and files that would trip up a consecutive run and break at checkout; https://github.com/peter-murray/reset-workspace-ownership-action
It is still not a perfect solution can can be appended to the end of the workflow with minimal impact and overhead to the workflow until there is a longer term fix available.
Thank you, this workaround action helped solve this problem for us.
Created a few weeks ago an example to run on ubuntu with rootless docker. Still testing the set up but it should avoid the root problem, since the docker mapping is fixed.
I created a guide based on @npalm 's example on how to run the github actions runner with rootless docker: https://stackoverflow.com/questions/66137419/how-to-enable-non-docker-actions-to-access-docker-created-files-on-my-self-hoste
I actually stumbled upon another error, it mostly seems to work, hence the guide, but the new v2 docker build push action which uses buildx fails with
buildx call failed with: error: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: write sysctl key net.ipv4.ping_group_range: write /proc/sys/net/ipv4/ping_group_range: invalid argument: unknown
Edit: filed issue here docker/build-push-action#292
Managed to solve that by adding driver: docker:
- uses: docker/setup-buildx-action@v1
with:
driver: docker
Everything now works just as on the ubuntu-latest runners for me! again thanks to npalm
Cleaning up after every docker step, actually solves this issue.
runs-on: self-hosted
container:
image: python:3.8
steps:
- uses: actions/checkout@v2
- name: <DO STUFF WITH CODE>
run: <DO STUFF WITH CODE>
- name: if the above step failed
if: ${{ failure() }}
run: rm -rf ..?* .[!.]* *
- name: clean
run: rm -rf ..?* .[!.]* *
@jef if you have a docker step that can fail, you can use the above trick as a workaround. Otherwise, always cleanup after every docker step.
Hi there,
I'm expecting different but related issue.
My workflow if not container based, but as soon I have single step based on dockerized action, like github/super-linter, all the files are owned by root
after this step.
And all non-docker steps which runs after using runner
user. So this is definitely a bug.
For self-hosted runners the solution is RUNNER_ALLOW_RUNASROOT=1
.
@rvoitenko thats not a different issue, thats literally this issue, and both my solution above with rootless docker, and your solution with runasroot will both work.
@Frederik-Baetens ok, but these solutions doesn't apply for managed runners when you don't have container:
job, but only steps based on dockerized third-party actions. So something need to be patched on github managed runners.
As far as I know no such problems exist on the managed runners because on the managed runners the actions run as root, thereby avoiding these ownership problems
As far as I know no such problems exist on the managed runners because on the managed runners the actions run as root, thereby avoiding these ownership problems
They don't run as root, the run as a user named "runner".
Although this particularly impacts self-hosted runners that re-use workspaces (the only option until #510 is solved) it isn't really specific to self-hosted runners. Here's an example (not terribly realistic) workspace file that uses the GitHub-hosted runners:
on: push
jobs:
repro:
runs-on: ubuntu-latest
steps:
- run: whoami
- uses: actions/checkout@v2
# It should work a second time
- uses: actions/checkout@v2
- name: Run a container that outputs to foo/output-file and puts nasty permissions on foo
uses: ./
- name: Print permissions for all files
run: ls -alFR || true
# Fails to git clean the foo folder, tries to rm -rf the checkout and then fails
- uses: actions/checkout@v2
Here's what happens if you run it:
In the "print permissions" step it prints out stuff like:
Run ls -alFR || true
.:
total 28
drwxr-xr-x 5 runner docker 4096 May 3 12:21 ./
drwxr-xr-x 3 runner docker 4096 May 3 12:21 ../
drwxr-xr-x 8 runner docker 4096 May 3 12:21 .git/
ls: cannot open directory './foo': Permission denied
drwxr-xr-x 3 runner docker 4096 May 3 12:21 .github/
-rw-r--r-- 1 runner docker 82 May 3 12:21 Dockerfile
-rw-r--r-- 1 runner docker 374 May 3 12:21 README.md
drwx------ 2 root root 4096 May 3 12:21 foo/
...
(note the error descending into ./foo
)
Yes, exactly. This is what will happen by default (either with hosted runners or self-hosted using the documented setup steps.)
Here is the Dockerfile for the repro.
I ran into this issue but it could have easily been avoided if the docs were better. I followed instruction from https://docs.github.com/en/actions/hosting-your-own-runners/adding-self-hosted-runners, which ultimately brought me to a URL https://github.com/**my-org**/**my-repo**/settings/actions/runners/new
. On that screen the docs say:
# Create the runner and start the configuration experience
$ ./config.sh --url https://github.com/farmerstoyou/rails_app --token ABCD
# Last step, run it!
$ ./run.sh
The ./run.sh
cannot be run as root, as many have pointed out. But there is another script in the same dir that WILL start the runner as a systemd service running under root. The above docs should add this:
# Run it as a systemd service:
$ ./svc.sh install
$ ./svc.sh start
@dhughesbc π Cool, working fine for me like magic, this should become "best practice".
./svc.sh start
must be run as root, but that doesn't run the process as root - still as the current user. Issue still prevails.
@klausbadelt just use rootless docker and you won't have any more problems, without having to run anything as root.
I had the same problem in a workflow that builds and pushes a docker image. A trick I use to circumvent this problem is using a docker-in-docker approach. This may only work on self-hosted runners (where I used and tested the workflow so far). It mounts the runner's docker.sock
and runs the docker commands in an own docker:stable
container.
Example workflow:
jobs:
build-push-base-image:
name: Build and push the image
runs-on: [ self-hosted, ubuntu ]
container:
image: docker:stable
volumes:
# Mount the docker sock for host's docker engine to be usable inside container
- /var/run/docker.sock:/var/run/docker.sock
defaults:
run:
# Force sh since bash is not supported in docker:stable
shell: sh
steps:
- uses: actions/checkout@v2
- name: Login to Docker Repo
uses: docker/login-action@v1
with:
registry: ...
username: ...
password: ...
- name: Build, tag and push the image
run: |
docker build -t ${{ env.DOCKER_IMAGE_NAME }}:${{ env.DOCKER_IMAGE_TAG }} .
docker tag ${{ env.DOCKER_IMAGE_NAME }}:${{ env.DOCKER_IMAGE_TAG }} ${{ env.DOCKER_REPO }}/${{ env.DOCKER_ORG }}/${{ env.DOCKER_IMAGE_NAME }}:${{ env.DOCKER_IMAGE_TAG }}
docker push ${{ env.DOCKER_REPO }}/${{ env.DOCKER_ORG }}/${{ env.DOCKER_IMAGE_NAME }}:${{ env.DOCKER_IMAGE_TAG }}
Similar problem here, using actions with global tooling like actions/setup-go@v2
can be run without container (from runner user) and from container in different jobs.
It can break file permissions and there's no easy way to fix permissions automatically after job is finished except doing it by hand
Having the same issue when devs are building Docker images.
If Dockerfiles don't explicitly drop permissions and are using volumes to the host - files can be created with UID 0.
This causes Github's cleanup workflow to fail as it doesn't have permissions to delete root files.
For example note the .mypy_cache
directory below:
$ pwd
/home/ec2-user/actions-runner/_work/my_cool_app/src/glue/app
$ ls -la
total 44
drwxr-xr-x 6 ec2-user ec2-user 200 Sep 27 23:46 .
drwxr-xr-x 3 ec2-user ec2-user 17 Sep 27 23:44 ..
-rw-r--r-- 1 ec2-user ec2-user 308 Sep 27 23:44 Dockerfile
drwxr-xr-x 4 ec2-user ec2-user 98 Sep 27 23:44 app
-rw-r--r-- 1 ec2-user ec2-user 2546 Sep 27 23:44 app_run.py
drwxr-xr-x 3 root root 76 Sep 27 23:46 .mypy_cache
It would be great if Github had some global (organisation wide) defaults that get applied to all repos.
Or a post step for the checkout action that always cleans up at the end of the job that started it.
@rajbos but if in the root owned files were created by a docker container running with a volume (bind mount) without correctly mapping the user to the host then the cleanup step may not have permissions to delete the file.
I suppose you could add a sudoers rule to allow rm -rf on any files under the worker path, but it would be nice if Github could make it easier to globally (org wide) configure defaults for all workflows.
@sammcj you can use the same "feature" that created the files to remove or chown them, i.e. a docker based action step, which is what this action step does; https://github.com/marketplace/actions/reset-workspace-ownership-action. It runs as root
inside the container and resets the owner to the specified uid
which would be that of the actions runner user for instance.
You can add that as a post step that will always run as the last step in your job. Yes it is far from a perfect solution, but gives you control whereby you do not need to over privilege the runner user account by default and you cannot influence or control these docker based actions containers easily.
For those of you who can use rootless Docker, ScribeMD offers the rootless-docker GitHub action. It has only been tested on ubuntu-20.04
so far in the interest of incremental progress, but seeing as it is very simple, I am optimistic that it will work on ubuntu-18.04
. It technically has a race condition since it doesn't wait for the Docker daemon to be ready, but regardless that generally happens much faster than launching a new shell for the next step. If anyone has thoughts on how best to eliminate the race condition, I am all ears.
Everyone's comments above are appreciated greatly. I prefer to run everything in a container to keep the build server's environment as clean as possible, and I ended up with the following at the start of my jobs section:
jobs:
build_and_test:
runs-on: [ self-hosted ]
container:
image: ubuntu
steps:
- name: Clean the workspace
run: rm -rf $GITHUB_WORKSPACE/*
- uses: actions/checkout@v2
...
If a job container is not specified, you can use this as a cleanup step:
- name: Clean the workspace
uses: docker://alpine
with:
args: /bin/sh -c "rm -rf /github/workspace/.* || rm -rf /github/workspace/*"
A few additional thoughts:
- Coming from drone.io, it's a bit disappointing that environment pollution is even something that needs solved manually with GH actions (with containers), but I don't think that starting the runner as root is the right solution; this likely isn't even an option in many enterprise environments.
- The checkout action really ought to handle the cleanup. If it's running in a container because of a job container specification, it should have the same permissions to modify/delete the files as the commands that created them.
- None of this would be an issue if a temporary docker volume was used instead of volume-mounting the workspace dir on the runner host. Any additional mounted volumes can be specified manually, but if you're running everything in a container, I'm not sure why you'd want them. The whole reason I want to run everything in a container is to start with a clean slate, not a workspace contaminated by other builds.
- The checkout action could even run in a container itself, preventing the need to have git 2.18+ installed on the runner. (The ability to specify a cert bundle would be critical for GHES customers though.)
The aforementioned race condition has been fixed in rootless-docker@0.1.1. I share @jsmartt's perspective that running the runner as root is less desirable. I would expect using rootless Docker to work on most self-hosted Linux images, and I believe the outcome is essentially the same as cleaning the workspace since the runner cleans up after itself when it has permission to do so.
You might not be able to run with sudo, but you can add user to the
root
group. That worked for me :)
sudo usermod -a -G root <USER_NAME>
This works for me as well
sudo usermod -a -G root <USER_NAME>
Thank you :-) This worked for me.
Are people still having this issue? I've been able to fix my issue by doing what was suggested in #434 (comment) but it feels a bit gross to run the service as root
.
My issue is that after running a black formatting action (https://github.com/psf/black) there are files leftover in the _actions
dir that aren't owned by the runner's user but rather by root
:
Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/lib/python3.10/site-packages/black-23.3.0.dist-info/INSTALLER' is denied.) (Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/pyvenv.cfg' is denied.)
(Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/lib64' is denied.) (Access to the path '/home/ubuntu/actions-runner/_work/_actions/psf/black/stable/.black-env/lib/python3.10/site-packages/mypy_extensions.py' is denied.)
this causes subsequent actions to fail with the above message.
Yep, still happens with a few different clients of mine.
Instead of running the service as root, you can use ScribeMD/rootless-docker to run Docker in rootless mode.
Not 100% sure this is relevant sorry but to work around what I think is a similar problem, I wrote a small script that dynamically creates a user inside the container with the ID that matches the user ID outside: https://stackoverflow.com/a/74330112
In that case the problem solved is: sharing inside files outside.
It's ugly but surprisingly effective at solving what seems to be a docker design issue.
If anyone is having the problem I mentioned above with black
, I've actually had a PR just merged that will remove all files created by black
during the action π