containers/bubblewrap

bubblewrap inside unprivileged docker

aurium opened this issue ยท 11 comments

bubblewrap is becoming a popular sandbox tool, so we need be able to use it inside unprivileged docker to containerize solutions.

As you may know bwrap works correctly in a privileged container:

$ docker run \
    --privileged \
    -v $HOME/SteamHome:/myself \
    -e HOME=/myself \
    -w /myself \
    -ti --entrypoint /bin/bash \
    ubuntu:jammy
# Ok! We are inside a privileged docker
root@2b7dfc1b3179:~# bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /etc /etc --ro-bind /lib /lib --ro-bind /lib32 /lib32 --ro-bind /lib64 /lib64 --dir /tmp --dir /var --proc /proc --dev /dev --unshare-all --share-net --die-with-parent --dir /run/user/$(id -u) --bind /tmp /SteamHome --chdir /SteamHome /bin/bash
root@2b7dfc1b3179:/SteamHome#
# Great! Not so great, because privileged docker services are not really jailed.

You also may know that wont work with a simple --privileged removal:

$ docker run \
    -v $HOME/SteamHome:/myself \
    -e HOME=/myself \
    -w /myself \
    -ti --entrypoint /bin/bash \
    ubuntu:jammy
# Now we are inside an unprivileged docker
root@2b7dfc1b3179:~# bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /etc /etc --ro-bind /lib /lib --ro-bind /lib32 /lib32 --ro-bind /lib64 /lib64 --dir /tmp --dir /var --proc /proc --dev /dev --unshare-all --share-net --die-with-parent --dir /run/user/$(id -u) --bind /tmp /SteamHome --chdir /SteamHome /bin/bash
bwrap: No permissions to create new namespace, likely because the kernel does not allow non-privileged user namespaces. See <https://deb.li/bubblewrap> or <file:///usr/share/doc/bubblewrap/README.Debian.gz>.
# Expected fail

Now lets try to give all permissions, then when we succeed, we can remove one by one to use only the necessary capabilities:

DEVICES='--device=/dev/rtc'
for dev in /dev/*; do
  test -h $dev && echo "Not shared: $(ls -l $dev)" || true
  test -d $dev -o -h $dev || DEVICES="$DEVICES --device=$dev"
  test -d $dev && DEVICES="$DEVICES -v=$dev:$dev" || true
done
Not shared: lrwxrwxrwx 1 root root 11 abr  1 08:58 /dev/core -> /proc/kcore
Not shared: lrwxrwxrwx 1 root root 13 abr  1 08:58 /dev/fd -> /proc/self/fd
Not shared: lrwxrwxrwx 1 root root 12 abr  1 08:58 /dev/initctl -> /run/initctl
Not shared: lrwxrwxrwx 1 root root 28 abr  1 08:58 /dev/log -> /run/systemd/journal/dev-log
Not shared: lrwxrwxrwx 1 root root 4 abr  1 08:58 /dev/rtc -> rtc0
Not shared: lrwxrwxrwx 1 root root 15 abr  1 08:58 /dev/stderr -> /proc/self/fd/2
Not shared: lrwxrwxrwx 1 root root 15 abr  1 08:58 /dev/stdin -> /proc/self/fd/0
Not shared: lrwxrwxrwx 1 root root 15 abr  1 08:58 /dev/stdout -> /proc/self/fd/1
docker run \
    --cap-add SYS_CHROOT \
    --cap-add SYS_ADMIN \
    --cap-add SETUID \
    --cap-add SETGID \
    --cap-add SYS_PTRACE \
    --cap-add NET_ADMIN \
    --cap-add AUDIT_WRITE \
    --cap-add CHOWN \
    --cap-add DAC_OVERRIDE \
    --cap-add FOWNER \
    --cap-add FSETID \
    --cap-add KILL \
    --cap-add MKNOD \
    --cap-add NET_BIND_SERVICE \
    --cap-add NET_RAW \
    --cap-add SETFCAP \
    --cap-add SETGID \
    --cap-add SETPCAP \
    --cap-add SETUID \
    --cap-add SYS_CHROOT \
    --cap-add AUDIT_CONTROL \
    --cap-add AUDIT_READ \
    --cap-add BLOCK_SUSPEND \
    --cap-add DAC_READ_SEARCH \
    --cap-add IPC_LOCK \
    --cap-add IPC_OWNER \
    --cap-add LEASE \
    --cap-add LINUX_IMMUTABLE \
    --cap-add MAC_ADMIN \
    --cap-add MAC_OVERRIDE \
    --cap-add NET_BROADCAST \
    --cap-add SYS_BOOT \
    --cap-add SYS_MODULE \
    --cap-add SYS_NICE \
    --cap-add SYS_PACCT \
    --cap-add SYS_PTRACE \
    --cap-add SYS_RAWIO \
    --cap-add SYS_RESOURCE \
    --cap-add SYS_TIME \
    --cap-add SYS_TTY_CONFIG \
    --cap-add SYSLOG \
    --cap-add WAKE_ALARM \
    $DEVICES \
    -v $HOME/SteamHome:/myself \
    -e HOME=/myself \
    -w /myself \
    -ti --entrypoint /bin/bash \
    ubuntu:jammy
# Ok... It looks alike the `--privileged` result.
root@335ec51ae632:~# bwrap --ro-bind /usr /usr --ro-bind /bin /bin --ro-bind /etc /etc --ro-bind /lib /lib --ro-bind /lib32 /lib32 --ro-bind /lib64 /lib64 --dir /tmp --dir /var --proc /proc --dev /dev --unshare-all --share-net --die-with-parent --dir /run/user/$(id -u) --bind /tmp /SteamHome --chdir /SteamHome /bin/bash
bwrap: Failed to make / slave: Permission denied
# Oh! Unexpected fail!

To be sure I ran capsh --print on both --privileged try and on the all --cap-add try. Both give me the same result:

Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Ambient set =
Current IAB: !cap_perfmon,!cap_bpf,!cap_checkpoint_restore
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=1000(myself) euid=1000(myself)
gid=1000(myself)
groups=
Guessed mode: UNCERTAIN (0)
  • So... Why the "Failed to make / slave: Permission denied" result?
  • Did I missed a docker parameter?
  • Should I run another command provide some missing information?

You are missing apparmor

--security-opt apparmor=unconfined

The setuid variant can potentially run as root in the container but I have not gotten that working.
I discovered this while putting together a steamos container.

You are also missing the seccomp filter

--security-opt seccomp=unconfined

Docker's default seccomp filter blocks the clone and unshare syscalls (among others), which bubblewrap needs to create a new namespace.
Podman's seccomp filter is more permissive. Bubblewrap works (with a few limitations) in an unprivileged Podman container.

I'm experiencing the same behaviour trying out bubblewrap inside a k8s pod - even with seccomp set to Unconfined.

for me it was enough to add
--cap-add SYS_ADMIN --security-opt apparmor=unconfined --security-opt seccomp=unconfined
like advised by @s-hamann and @thelamer

smcv commented

bubblewrap cannot work if it's run inside a container that doesn't allow the necessary syscalls, mount operations, etc. to let bubblewrap to do its job. The precise permissions that are required are not obvious, partly because the kernel gives us very little diagnostic information when we don't have them ("Permission denied" is as much as we get).

This isn't a bubblewrap bug: doing impossible things is out-of-scope for this project.

I think it's probably a common request that I've seen me and my team looking for too: people would like to use bubblewrap (or something similar) in a confined environment (like Openshift in its default configuration for example). I guess documenting clearly what's required might both help and cut down the noise.

I am experiencing the same problems in ubuntu 24.04. I am using bwrap in docker container. apparmor=unconfined (included in --privileged option) is not enough, because you are just disabling some apparmor profiles and these profiles are not ideal, if you put it mildly. Actually apparmor profiles looks like bug on the bug and main bug is driving all this construction. Solution is the following:

abi <abi/4.0>,
include <tunables/global>

profile bwrap /usr/bin/bwrap flags=(unconfined) {
  userns,
  include if exists <local/bwrap>
}

You need to put this code in /etc/apparmor.d/usr.bin.bwrap (on the root machine) and run systemctl restart apparmor.service.

smcv commented

@andrew-aladjev:

I am experiencing the same problems in ubuntu 24.04

Not really: you are experiencing a new, different problem that has a similar symptom.

Ubuntu has changed the Ubuntu 24.04 kernel so that programs like bubblewrap are not allowed to create a new user namespace unless they are given an AppArmor profile that contains the userns permission. This is their choice, and if it's causing a problem for you, please report it to them. Changes in bubblewrap are not going to solve this.

A relevant Ubuntu bug is https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2046844.

Ubuntu developers have said that they are intentionally not adding a profile like the one you've suggested (reference: https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2046844/comments/90, https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2046844/comments/91). What they are doing instead is adding a profile for each program that uses bubblewrap, including Flatpak, Steam, nautilus/GNOME Files (via libgnome-desktop), epiphany/GNOME Web (via WebKitGTK) and so on, as well as adding a profile for each program that does not use bubblewrap but does similar things a different way, such as Firefox and Chrome. If you are using some different program that invokes bwrap - for example mkosi - my understanding is that they would tell you to add a profile for that program instead of a profile for bwrap.

I personally think their stated reasoning is flawed: they say that the reason is that giving bwrap a profile like this would allow for an arbitrary bypass of their restriction, but programs like the ones for which they are adding profiles are not designed to impose a security boundary that distrusts their caller either, so it's straightforward for an unprivileged user to bypass their restriction anyway. But I didn't design their security model, and what they choose to do in their distro is not my decision.

Ubuntu has changed the Ubuntu 24.04 kernel so that programs like bubblewrap are not allowed to create a new user namespace unless they are given an AppArmor profile that contains the userns permission. This is their choice

I wonder how long it would take for them to reconsider that choice.

smcv commented

I wonder how long it would take for them to reconsider that choice.

This is not Ubuntu's issue tracker and we have no control over what they do, so please take any speculation or advocacy about this to Ubuntu/Canonical issue trackers rather than here.

What they are doing instead is adding a profile for each program that uses bubblewrap, including Flatpak, Steam, nautilus/GNOME Files (via libgnome-desktop), epiphany/GNOME Web (via WebKitGTK) and so on, as well as adding a profile for each program that does not use bubblewrap but does similar things a different way, such as Firefox and Chrome. If you are using some different program that invokes bwrap - for example mkosi - my understanding is that they would tell you to add a profile for that program instead of a profile for bwrap.

It will be good to add this info into bwrap docs, despite the fact it is related to ubuntu, thank you.