containers/crun-vm

Make Docker work without `--security-opt label=disable`

albertofaria opened this issue · 10 comments

crun-vm currently only works with Docker if --security-opt label=disable is given. Without it, a call to umount2() made by passt fails with EPERM.

Do you have the AVC messages?

AFAICT there are no messages

After it fails with Docker, do sudo ausearch -m avc -ts recent

$ docker run --runtime crun-vm -it --rm quay.io/containerdisks/fedora:39 ""
error: Failed to start domain 'domain'
error: internal error: Child process (/usr/bin/passt --one-off --socket /run/libvirt/qemu/passt/1-domain-net0.socket --pid /run/libvirt/qemu/passt/1-domain-net0-passt.pid --tcp-ports all --udp-ports all) unexpected exit status 1: Don't run as root. Changing to nobody...
No routable interface for IPv6: IPv6 is disabled
Template interface: eth0 (IPv4)
MAC:
    host: 02:42:ac:11:00:02
DHCP:
    assign: 172.17.0.2
    mask: 255.255.0.0
    router: 172.17.0.1
DNS:
    192.168.1.254
    192.168.1.254
DNS search list:
    Home
UNIX domain socket bound at /run/libvirt/qemu/passt/1-domain-net0.socket

You can now start qemu (>= 7.2, with commit 13c6be96618c):
    kvm ... -device virtio-net-pci,netdev=s -netdev stream,id=s,server=off,addr.type=unix,addr.path=/run/libvirt/qemu/passt/1-domain-net0.socket
or qrap, for earlier qemu versions:
    ./qrap 5 kvm ... -net socket,fd=5 -net nic,model=virtio
umount2: Permission denied
Failed to sandbox process, exiting

$ sudo ausearch -m avc -ts recent
<no matches>

I am questioning wheter this is an SELinux issue or a seccomp issue.

If you do sudo setenforce 0 to disable SELinux, does the docker command work. I don't believe docker runs with SELinux on by default unless you installed moby-engine?

docker run alpine cat /proc/self/attr/current

If this comes back as something other the container_t, then SELinux separation is not enabled.

I could understand you having to run

docker run --security-opt seccomp=unconfined ...

To run QEMU.

If when using setenforce 0 it works, could you try with sudo setenforce 1
sudo semodule -DB
Run rest with docker and crun-vm.
sudo ausearch -m avc -ts recent
sudo semodule -B

It works after sudo setenforce 0, stops working after sudo setenforce 1, then:

$ sudo semodule -DB
$ sudo ausearch -m avc -ts recent
----
time->Sat Mar  2 12:22:48 2024
type=AVC msg=audit(1709382168.712:4989): avc:  denied  { unmount } for  pid=461770 comm="passt.avx2" scontext=system_u:system_r:container_t:s0:c718,c722 tcontext=system_u:object_r:tmpfs_t:s0 tclass=filesystem permissive=0

Thanks. Do you know if that will eventually propagate into Fedora 39?

Yes it should, not sure why the automatic build did not happen. @lsm5 ?

https://github.com/containers/container-selinux/releases/tag/v2.230.0

Might be because v2.229.1 is still in updates testing.