Allow FUSE functionality by default

Question

Allow FUSE functionality by default

Opened this issue 6 years ago · 34 comments

This is a bug report
This is a feature request
I searched existing issues before opening this one

Expected behavior

Mounting FUSE filesystems should work out-of-the box, because it is safe. It fits within the idea of a containerized app.

Actual behavior

An attempt to mount a FUSE filesystem fails with:

fuse: device not found, try 'modprobe fuse' first
or
fuse: failed to exec fusermount: No such file or directory

The only way to fix it is to run the container with additional permissions:

--cap-add SYS_ADMIN --device /dev/fuse

This makes it very difficult to run FUSE inside Docker because it is often all but impossible to run with additional flags in a managed environment.

Steps to reproduce the behavior

git clone https://github.com/rustyx/fuse-hello.git
docker build fuse-hello -t hello
docker run -it hello
docker run -it --device /dev/fuse hello
docker run -it --cap-add SYS_ADMIN --device /dev/fuse hello

Output of docker version:

Client:
 Version:       18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f51b1
 Built: Thu Jan 11 22:29:41 2018
 OS/Arch:       windows/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.10.1
  Git commit:   f150324
  Built:        Wed May  9 22:20:42 2018
  OS/Arch:      linux/amd64
  Experimental: false

Answer 1 · 2019-01-04T20:52:35.000Z

Strongly agree this would be a great feature. It's fairly common to abstract various services via a FUSE driver. If mounting one requires root-like capabilities it encourages lax security.

Answer 2 · 2019-03-01T13:11:09.000Z

The kernel requires SYS_ADMIN we can't change this.

Answer 3 · 2019-04-30T13:37:04.000Z

@justincormack What about this FUSE Gets User Namespace Support With Linux 4.18

Just a memo below on how it doesn't work currently.

Ubuntu 18.04 with stock hwe kernel 4.18.0-18, docker 18.09.5.

docker run --rm -it --device=/dev/fuse ubuntu:18.04
apt update
apt install -y fuseiso wget
adduser --disabled-password --gecos '' test
cd /home/test
su test
mkdir mnt
wget https://cdn.openbsd.org/pub/OpenBSD/6.5/amd64/cd65.iso
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
addgroup fuse
usermod -aG fuse test
su test
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted

So atm it doesn't work for

root
regular user
regular user added to fuse group (just in case)

Answer 4 · 2019-07-21T15:18:24.000Z

Someone correct me I am wrong, trying to wrap my head around the limitations here.

The user namespace means we could do the current method more securely, perhaps without adding the SYS_ADMIN capabilities, but would still require the fuse device to be passed through.

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

If a container's OS was modified to intercept file system calls to emulate it's own FUSE then those FUSE mounts would not be accessible from the host. Is this even possible?

Answer 5 · 2019-07-21T22:25:17.000Z

This prevents containers with FUSE from being used on Windows and OSX hosts.

Fuse mounting inside containers work just fine with Docker for Windows, when passing the same flags: --cap-add SYS_ADMIN --device /dev/fuse.

I think the parent poster would want it to just work without any flags?

In my opinion the SYS_ADMIN is the one we shouldn't need. If only --device /dev/fuse were required.

Answer 6 · 2019-07-22T01:20:04.000Z

@zbyte64

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

Well, as of 4.18, you have user namespace mounts for fuse which means you shouldn't need to change the host mounts and thus wouldn't need SYS_ADMIN.

Answer 7 · 2019-08-01T07:49:54.000Z

@omeid , what do you meant by 4.18 ? the latest version of blobfuse is 1.0.3 ? which version of blobfuse are you using to run as non root?

Answer 8 · 2019-08-01T11:18:45.000Z

@cometta He means this #321 (comment)

Answer 9 · 2019-12-13T22:12:26.000Z

+1 on this, requiring SYS_ADMIN is basically a non-starter for us, though the extra device shouldn't be an issue (assuming 4.18+ kernels). Can this get triaged ?

Answer 10 · 2019-12-24T05:25:22.000Z

The ability to run fuse without SYS_ADMIN has been enabled for since August, 2018, and yet there hasn't been much traction on this ticket. Running in privilege mode in production should scare most security teams! Is there anything we can do to get more traction on this story?

Answer 11 · 2020-02-24T08:27:54.000Z

SYS_ADMIN is quite a powerful role, if there is a way to mount without that role, it could avoid a lot risk.

Answer 12 · 2020-03-16T08:46:38.000Z

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

Answer 13 · 2020-03-17T16:02:14.000Z

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

Checkout this earlier comment, Linux kernel appears to have added namespace support for fuse in 4.18.

Answer 14 · 2020-03-18T23:03:55.000Z

Has anybody actually tried to do this?
I've added mount to my seccomp allow list and still get permission denied on mount:

/bin/fusermount: mount failed: Operation not permitted
panic: fusermount exited with code 256


goroutine 1 [running]:
main.main()
	/Users/cpuguy83/go/src/github.com/cpuguy83/tarfs/cmd/tarfsd/main.go:46 +0x697
root@6bd1a24bcd1a:/# uname -a
Linux 6bd1a24bcd1a 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 armv7l GNU/Linux

Something tells me there is much more to this than just allowing mount without CAP_SYS_ADMIN

Answer 15 · 2020-03-21T01:29:11.000Z

@cpuguy83 Make sure you have unprivileged_userns_clone kernel param set.

Answer 16 · 2020-03-23T17:47:28.000Z

@omeid That's a debian specific kernel param for enabling (or rather disabling?) userns for unprivileged users, I think?

Answer 17 · 2020-03-24T03:07:23.000Z

Debian, Archlinux, too. Check your kernel documentation, and also make sure it is compiled with .CONFIG_USER_NS.

Answer 18 · 2020-03-24T16:21:28.000Z

@omeid I can create a userns just fine, what I can't do is mount in the userns w/o CAP_SYS_ADMIN.
I'm attempting to do this by taking the default seccomp profile and adding unshare and mount to the allow list.

Answer 19 · 2020-06-06T22:42:25.000Z

Any updates on this since?

Answer 20 · 2020-06-16T11:31:03.000Z

I need this as well, and giving my containers SYS_ADMIN permissions just for FUSE is not an option

Answer 21 · 2020-08-20T15:43:40.000Z

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
Ensure the fuse module is loaded
Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

Answer 22 · 2020-12-17T23:48:08.000Z

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

Answer 23 · 2020-12-18T00:52:03.000Z

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

Answer 24 · 2020-12-18T09:38:15.000Z

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

Mounting anything (FUSE and other filesystems) requires CAP_SYS_ADMIN privileges even without seccomp restrictions. Outside Docker, unprivileged users can run sshfs with the help of the setuid-root helper binary fusermount. However, in a Docker container setuid fusermount is not supported and hence, sshfs fails unless the Docker container is privileged.

The mentioned unshare command grants CAP_SYS_ADMIN privileges in new user and mount namespaces. This doesn't provide any additional access to the host system, however, it allows mount operations in that new mount namespace. With Linux 4.18 and later, FUSE mounts are allowed in that new mount namespace as well. So sshfs can work inside the new namespaces.

Other container engines may create an unprivileged user namespace as part of container startup, which may allow mounts without the extra unshare step. However, Docker doesn't work that way with its system daemon.

Answer 25 · 2020-12-18T12:03:38.000Z

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

I know, I ment he runs sshfs inside unshare'd namespace inside docker.

Answer 26 · 2020-12-18T18:44:00.000Z

I'm trying to understand the issue and also need FUSE in docker.

Slightly unrelated. But I also use FUSE in docker to mount ISO files as a non-root user.

Answer 27 · 2021-04-02T18:52:44.000Z

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.

Ensure the fuse module is loaded

Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json

In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.

In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi : I was able to replicate this setup on ubuntu 18.04. I used -r instead of -c because the util-linux shipped in ubuntu18.04 does not have -c

Fuse works :) But i want to be able to install ubuntu packages in the unshared shell (apt-get install foo) . I get this error

W: chown to _apt:root of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (1: Operation not permitted)
W: chown to _apt:root of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (1: Operation not permitted)

Do you have any suggestions to work around this?

Answer 28 · 2022-07-10T21:52:27.000Z

Any progress on this?

Answer 29 · 2023-01-15T23:49:18.000Z

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.

Ensure the fuse module is loaded

Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json

In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.

In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

can we possibly get a docker image of this config?

Answer 30 · 2023-02-08T09:50:02.000Z

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.

Ensure the fuse module is loaded

Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json

In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.

In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

Is there any security implications doing so ? I want to allow untrusted users to access FUSE for rclone mount but it would be great if they can't access the host's filesystem.

Answer 31 · 2023-10-17T15:55:22.000Z

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:
* Patch the seccomp profile to drop the restriction on `clone(2)` namespace flags and allow `mount(2)` and `umount(2)`: https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for `profiles/seccomp/default.json` as available in the docker/moby repositories.

* Ensure the `fuse` module is loaded

* Run the Docker container with the options `--device /dev/fuse --security-opt seccomp=/path/to/fuse.json`

* In the Docker container run `unshare -c --keep-caps -m` to open a shell in new unprivileged user and mount namespaces.

* In that new shell it's then possible to mount and use FUSE. E.g., `sshfs user@host:directory /mnt`
Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi Thanks to this reply I am 99% of the way to a working set up of using sshfs inside Kubernetes, however I cannot seem to write to the sshfs mount. I have been able to replicate this on my local machine (without using Docker/Kubernetes). Mounting over sshfs works in an unshared shell, but I cannot write to the mount. Using the exact same mounting command outside of the unshared shell gives me write access, so I am sure it is not an issue on the remote server. Any suggestions how to fix this?

Answer 32 · 2023-11-21T08:01:45.000Z

any update?

Answer 33 · 2023-11-29T15:03:50.000Z

Any progress?

Answer 34 · 2023-11-29T16:33:58.000Z

I don't think it's helpful to ping everyone for progress update here, if there is any progress, it will be reported by the ones making progress, in either this issue or a PR (Pull Request).

For future readers, please refrain from commenting every week on any updates, as this is inappropriate behavior, this is open source software, if you really want an update, then make it, create a PR fixing the issue, else wait for the update.