Rootless podman using --device and --group-add keep-groups not working as expected
KCSesh opened this issue · 15 comments
Description
I am trying to understand how to properly use --device
in a rootless podman container.
Currently, when I added a device to the rootless container I see that the device is owned by: nobody nogroup
$ ls -la /dev/
...
crw-rw---- 1 nobody nogroup 505, 1 Apr 26 18:32 nvhost-as-gpu
...
I have seen this on the troubleshooting: https://github.com/containers/podman/blob/master/troubleshooting.md#20-passed-in-device-cant-be-accessed-in-rootless-container
But this is only a solition for crun
is there one for runc?
I have pulled the latest podman
and have attempted to use:
http://docs.podman.io/en/latest/markdown/podman-run.1.html#device-host-device-container-device-permissions
--group-add keep-groups
But this does not seem to change behavior, I still see the device is owned by: nobody nogroup
I believe this issue is preventing me from accessing my GPU in a rootless container.
See here if you want specific details: NVIDIA/nvidia-container-runtime#85 (comment)
What are my options? Do I need to migrate to crun? Will that work? Should this be working with runc and --group-add?
Steps to reproduce the issue:
-
podman run -it --device </dev/some-mnt>:</dev/some-mnt> --group-add keep-groups
-
$ ls -la /dev
-
Output will show device is owned by
nobody nogroup
-
I have also tried with
--group-add video
with no luck either.
Describe the results you received:
$ ls -la /dev/
...
crw-rw---- 1 nobody nogroup 505, 1 Apr 26 18:32 nvhost-as-gpu
...
Describe the results you expected:
I would expect to be able to see the video
group.
$ ls -la /dev/
...
crw-rw---- 1 nobody video 505, 1 Apr 26 18:32 nvhost-as-gpu
...
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version
:
$ podman --version
podman version 3.2.0-dev
Output of podman info --debug
:
podman --storage-driver=vfs --root /data/podman-root/ --runroot /data/podman-run-root/ info --debug
host:
arch: arm64
buildahVersion: 1.20.1-dev
cgroupManager: cgroupfs
cgroupVersion: v1
conmon:
package: Unknown
path: /usr/local/libexec/podman/conmon
version: 'conmon version 2.0.28-dev, commit: 3770524c7d9c95fe703460a9168350ee5db7be03'
cpus: 8
distribution:
distribution: tegra-ubuntu
version: "18.04"
eventLogger: file
hostname: ubuntu
idMappings:
gidmap:
- container_id: 0
host_id: 1001
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 4.9.140
linkmode: dynamic
memFree: 27120275456
memTotal: 33338081280
ociRuntime:
name: runc
package: 'runc: /usr/sbin/runc'
path: /usr/sbin/runc
version: 'runc version spec: 1.0.1-dev'
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
selinuxEnabled: false
slirp4netns:
executable: /data/downloads/slirp4netns/slirp4netns
package: Unknown
version: |-
slirp4netns version 1.1.9
commit: 4e37ea557562e0d7a64dc636eff156f64927335e
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.3.3
swapFree: 16669016064
swapTotal: 16669016064
uptime: 45h 22m 54.28s (Approximately 1.88 days)
registries:
search:
- docker.io
- registry.fedoraproject.org
- registry.access.redhat.com
store:
configFile: /home/<username>/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: vfs
graphOptions: {}
graphRoot: /data/podman-root
graphStatus: {}
imageStore:
number: 0
runRoot: /data/podman-run-root
volumePath: /data/podman-root/volumes
version:
APIVersion: 3.2.0-dev
Built: 1619474073
BuiltTime: Mon Apr 26 21:54:33 2021
GitCommit: 2039be00d12afaab84659619c47a463cacb039f5
GoVersion: go1.16
OsArch: linux/arm64
Version: 3.2.0-dev
Package info (e.g. output of rpm -q podman
or apt list podman
):
I built podman from source for ubuntu 18.04 on ARM
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical
The runc
OCI runtime does not support the annotation required for retaining groups, and I see no indication that this will change in the near future. I suggest you switch to crun
if you require it.
There has been an open PR for many months on this, but no movement.
Yes.
I tested swapping out to "crun" and this actually worked!
Which allows me to have GPU support in rootless! This is very exciting!
A slight note, the trouble shooting page still says:
--annotation io.crun.keep_original_groups=1
But it should be:
--annotation run.oci.keep_original_groups=1
See here for details: #4477
But also --group-add keep-groups
worked which is nice, I just had to pull mainline for it.
I do have 2 followup questions as well.
-
I tried adding the
group-add video
myself, but this was not enough. It does not detect the GPU. Is there somewhere in my container I can see the groups that were kept/mapped from the host when I add--group-add keep-groups
? -
What does keeping the original groups mean from a security perspective?
Is it giving the container more privilege somehow? I mean it must, because I can now access my GPU.
I have read this: https://www.redhat.com/sysadmin/supplemental-groups-podman-containers
But that doesn't really answer the question I am asking.
Essentially what is the difference to running a rootless container with out keeping the groups vs running a rootless container keeping the groups?
Basically, the annotation is causing the OCI runtime to skip one of the normal steps of setting up a container, which involves dropping additional group memberships. I'm actually writing a blog that includes many details on this at the moment.
It does definitely increase the privileges allowed to the container - the container process, if it breaks out of the container, now has access to the groups of the user that launched Podman, which could potentially include important ones (wheel, for example)
But note, this is only for Group Access via GID. SELinux, Dropped Capabilities, User Namespace, SECCOMP are still in effect. So taking advantage of WHEEL from the perspective of sudo access, is still going to be blocked.
Bottom line is if SELinux does not block access, to a file that is only readable/writable via supplimental group access and the container breaks out, then the container process would be able to read/write this file. But if a containerized process breaks out to your homedir, it most likely already has the ability to read/write everything in $HOME (Luckily SELinux blocks almost all of this access).
Not that this needs to remain open, but is there a way to see how the groups are 'kept' and where they are mapped?
So if I wanted to do this myself I could?
@vrothberg Is this something that psgo does (or could do)?
This would seem like a good job for psgo.
--hgroups
To rephrase my question, because I don't need to view the mappings per se. (Though it would be nice)
Essentially, is there a way I can map the groups myself with podman?
Meaning my understanding is that I needed the video
group to get access to my GPU.
When I add --group-add keep-groups
it works because per my understanding it is correctly mapping the video
group.
However, when I tried to do --group-add video
the container starts but I do not have access to my GPU, with my best guess being that I am missing an important mapping step?
So I am wondering how I can do this without using --group-add keep-groups
and control the mapping myself?
OR is this the only way it will work using: --group-add keep-groups
?
When you do --group-add video
, it is adding the video group defined inside of the container image, to the primary process of the container.
grep video /etc/group
video:x:39:
So now inside of the container the process will have group 39, BUT this is not the same as group 39 on the host. When running rootless containers you are using user namespace, so that the group is offset by the usernamespace you have joined.
$ podman unshare cat /proc/self/gid_map
0 3267 1
1 100000 65536
Which means that the video group inside of the container is going to be GID 100038 on the host.
ctr=$(podman run -d --group-add video fedora sleep 100)
pid=$(podman top -l hpid | tail -1)
grep Groups /proc/$pid/status
Groups: 100038
In order to access the video device on the host the process needs GID=39, so it fails. When you run with --group-add keep-groups, the oci container runtime (crun), does not call the setgroups call, so the new container process, maintains the groups of it's parent process. If the parent process had access to GID=39, the processes inside of the container will maintain still have that GID. Note that inside of the container the GID 39 is not mapped, so the processes within the container will see this as the nobody
group.
./bin/podman run --group-add keep-groups fedora groups
root nobody
Sorry for asking in an already closed issue, but I cannot find more information about this.
Does keep-groups
keep all extra groups? Or is there a limit?