docker/for-linux

Allow FUSE functionality by default

Opened this issue ยท 34 comments

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

Mounting FUSE filesystems should work out-of-the box, because it is safe. It fits within the idea of a containerized app.

Actual behavior

An attempt to mount a FUSE filesystem fails with:

fuse: device not found, try 'modprobe fuse' first
or
fuse: failed to exec fusermount: No such file or directory

The only way to fix it is to run the container with additional permissions:

--cap-add SYS_ADMIN --device /dev/fuse

This makes it very difficult to run FUSE inside Docker because it is often all but impossible to run with additional flags in a managed environment.

Steps to reproduce the behavior

git clone https://github.com/rustyx/fuse-hello.git
docker build fuse-hello -t hello
docker run -it hello
docker run -it --device /dev/fuse hello
docker run -it --cap-add SYS_ADMIN --device /dev/fuse hello

Output of docker version:

Client:
 Version:       18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f51b1
 Built: Thu Jan 11 22:29:41 2018
 OS/Arch:       windows/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.10.1
  Git commit:   f150324
  Built:        Wed May  9 22:20:42 2018
  OS/Arch:      linux/amd64
  Experimental: false

Strongly agree this would be a great feature. It's fairly common to abstract various services via a FUSE driver. If mounting one requires root-like capabilities it encourages lax security.

The kernel requires SYS_ADMIN we can't change this.

@justincormack What about this FUSE Gets User Namespace Support With Linux 4.18

Just a memo below on how it doesn't work currently.

Ubuntu 18.04 with stock hwe kernel 4.18.0-18, docker 18.09.5.

docker run --rm -it --device=/dev/fuse ubuntu:18.04
apt update
apt install -y fuseiso wget
adduser --disabled-password --gecos '' test
cd /home/test
su test
mkdir mnt
wget https://cdn.openbsd.org/pub/OpenBSD/6.5/amd64/cd65.iso
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
addgroup fuse
usermod -aG fuse test
su test
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted

So atm it doesn't work for

  • root
  • regular user
  • regular user added to fuse group (just in case)

Someone correct me I am wrong, trying to wrap my head around the limitations here.

The user namespace means we could do the current method more securely, perhaps without adding the SYS_ADMIN capabilities, but would still require the fuse device to be passed through.

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

If a container's OS was modified to intercept file system calls to emulate it's own FUSE then those FUSE mounts would not be accessible from the host. Is this even possible?

This prevents containers with FUSE from being used on Windows and OSX hosts.

Fuse mounting inside containers work just fine with Docker for Windows, when passing the same flags: --cap-add SYS_ADMIN --device /dev/fuse.

I think the parent poster would want it to just work without any flags?

In my opinion the SYS_ADMIN is the one we shouldn't need. If only --device /dev/fuse were required.

omeid commented

@zbyte64

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

Well, as of 4.18, you have user namespace mounts for fuse which means you shouldn't need to change the host mounts and thus wouldn't need SYS_ADMIN.

@omeid , what do you meant by 4.18 ? the latest version of blobfuse is 1.0.3 ? which version of blobfuse are you using to run as non root?

+1 on this, requiring SYS_ADMIN is basically a non-starter for us, though the extra device shouldn't be an issue (assuming 4.18+ kernels). Can this get triaged ?

The ability to run fuse without SYS_ADMIN has been enabled for since August, 2018, and yet there hasn't been much traction on this ticket. Running in privilege mode in production should scare most security teams! Is there anything we can do to get more traction on this story?

1zg12 commented

SYS_ADMIN is quite a powerful role, if there is a way to mount without that role, it could avoid a lot risk.

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

Checkout this earlier comment, Linux kernel appears to have added namespace support for fuse in 4.18.

Has anybody actually tried to do this?
I've added mount to my seccomp allow list and still get permission denied on mount:

/bin/fusermount: mount failed: Operation not permitted
panic: fusermount exited with code 256


goroutine 1 [running]:
main.main()
	/Users/cpuguy83/go/src/github.com/cpuguy83/tarfs/cmd/tarfsd/main.go:46 +0x697
root@6bd1a24bcd1a:/# uname -a
Linux 6bd1a24bcd1a 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 armv7l GNU/Linux

Something tells me there is much more to this than just allowing mount without CAP_SYS_ADMIN

omeid commented

@cpuguy83 Make sure you have unprivileged_userns_clone kernel param set.

@omeid That's a debian specific kernel param for enabling (or rather disabling?) userns for unprivileged users, I think?

omeid commented

Debian, Archlinux, too. Check your kernel documentation, and also make sure it is compiled with .CONFIG_USER_NS.

@omeid I can create a userns just fine, what I can't do is mount in the userns w/o CAP_SYS_ADMIN.
I'm attempting to do this by taking the default seccomp profile and adding unshare and mount to the allow list.

Any updates on this since?

I need this as well, and giving my containers SYS_ADMIN permissions just for FUSE is not an option

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

  • Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
  • Ensure the fuse module is loaded
  • Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
  • In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
  • In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

Mounting anything (FUSE and other filesystems) requires CAP_SYS_ADMIN privileges even without seccomp restrictions. Outside Docker, unprivileged users can run sshfs with the help of the setuid-root helper binary fusermount. However, in a Docker container setuid fusermount is not supported and hence, sshfs fails unless the Docker container is privileged.

The mentioned unshare command grants CAP_SYS_ADMIN privileges in new user and mount namespaces. This doesn't provide any additional access to the host system, however, it allows mount operations in that new mount namespace. With Linux 4.18 and later, FUSE mounts are allowed in that new mount namespace as well. So sshfs can work inside the new namespaces.

Other container engines may create an unprivileged user namespace as part of container startup, which may allow mounts without the extra unshare step. However, Docker doesn't work that way with its system daemon.

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

I know, I ment he runs sshfs inside unshare'd namespace inside docker.

I'm trying to understand the issue and also need FUSE in docker.

Slightly unrelated. But I also use FUSE in docker to mount ISO files as a non-root user.

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

  • Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
  • Ensure the fuse module is loaded
  • Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
  • In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
  • In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi : I was able to replicate this setup on ubuntu 18.04. I used -r instead of -c because the util-linux shipped in ubuntu18.04 does not have -c

Fuse works :) But i want to be able to install ubuntu packages in the unshared shell (apt-get install foo) . I get this error

W: chown to _apt:root of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (1: Operation not permitted)
W: chown to _apt:root of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (1: Operation not permitted)

Do you have any suggestions to work around this?

Any progress on this?

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

  • Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
  • Ensure the fuse module is loaded
  • Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
  • In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
  • In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

can we possibly get a docker image of this config?

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

  • Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
  • Ensure the fuse module is loaded
  • Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
  • In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
  • In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

Is there any security implications doing so ? I want to allow untrusted users to access FUSE for rclone mount but it would be great if they can't access the host's filesystem.

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

* Patch the seccomp profile to drop the restriction on `clone(2)` namespace flags and allow `mount(2)` and `umount(2)`: https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for `profiles/seccomp/default.json` as available in the docker/moby repositories.

* Ensure the `fuse` module is loaded

* Run the Docker container with the options `--device /dev/fuse --security-opt seccomp=/path/to/fuse.json`

* In the Docker container run `unshare -c --keep-caps -m` to open a shell in new unprivileged user and mount namespaces.

* In that new shell it's then possible to mount and use FUSE. E.g., `sshfs user@host:directory /mnt`

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi Thanks to this reply I am 99% of the way to a working set up of using sshfs inside Kubernetes, however I cannot seem to write to the sshfs mount. I have been able to replicate this on my local machine (without using Docker/Kubernetes). Mounting over sshfs works in an unshared shell, but I cannot write to the mount. Using the exact same mounting command outside of the unshared shell gives me write access, so I am sure it is not an issue on the remote server. Any suggestions how to fix this?

any update?

Any progress?

I don't think it's helpful to ping everyone for progress update here, if there is any progress, it will be reported by the ones making progress, in either this issue or a PR (Pull Request).

For future readers, please refrain from commenting every week on any updates, as this is inappropriate behavior, this is open source software, if you really want an update, then make it, create a PR fixing the issue, else wait for the update.