containers/podman

macOS: The limit of open files in mounted volumes is 64, which is incredibly low

mortie opened this issue · 31 comments

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Volumes mounted with -v have a max open file count of 64. This is way too low to be useful.

Steps to reproduce the issue:

  1. mkdir data-dir
  2. podman run --rm -it -v $(pwd)/data-dir:/data-dir ubuntu:22.04
  3. Run some program which tries to open a bunch of files in /data-dir.

Here's some source code I wrote for testing (and which also reports its resource limits, for good measure):

// nofile.c
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/resource.h>

int main(int argc, char **argv) {
	if (argc != 2) {
		printf("Usage: %s <file>\n", argv[0]);
	}

	struct rlimit rlim;
	if (getrlimit(RLIMIT_NOFILE, &rlim) < 0) {
		perror("getrlimit");
	} else {
		printf("RLIMIT_NOFILE: current: %lu, max: %lu\n", rlim.rlim_cur, rlim.rlim_max);
	}

	int n = 0;
	while (fopen(argv[1], "r")) n += 1;
	printf("Error after %i files: %s (errno %i)\n", n, strerror(errno), errno);
	return 0;
}

If we compile and run that:

  • apt update && apt install gcc
  • gcc -o nofile nofile.c
  • touch /data-dir/test-file
  • ./nofile /data-dir/test-file

Describe the results you received:

The program is only able to open 64 or so files.

Describe the results you expected:

The program should've been able to open a whole lot more than 64. I don't know what would be a good default, but 1048576 (which is used as the default resource limit for everything outside of volumes) seems like a good number.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.2.1
API Version:  4.2.1
Go Version:   go1.18.6
Built:        Tue Sep  6 21:16:02 2022
OS/Arch:      darwin/arm64

Server:       Podman Engine
Version:      4.2.1
API Version:  4.2.1
Go Version:   go1.18.5
Built:        Wed Sep  7 21:59:25 2022
OS/Arch:      linux/arm64

Output of podman info:

host:
  arch: arm64
  buildahVersion: 1.27.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.4-2.fc36.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.4, commit: '
  cpuUtilization:
    idlePercent: 98.52
    systemPercent: 0.28
    userPercent: 1.19
  cpus: 8
  distribution:
    distribution: fedora
    variant: coreos
    version: "36"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.19.12-200.fc36.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 6185467904
  memTotal: 8302534656
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.6-2.fc36.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.6
      commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.aarch64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 2h 38m 57.00s (Approximately 0.08 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 106825756672
  graphRootUsed: 3416588288
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 11
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.2.1
  Built: 1662580765
  BuiltTime: Wed Sep  7 21:59:25 2022
  GitCommit: ""
  GoVersion: go1.18.5
  Os: linux
  OsArch: linux/arm64
  Version: 4.2.1

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

==> podman: stable 4.2.1 (bottled), HEAD
Tool for managing OCI containers and pods
https://podman.io/
/opt/homebrew/Cellar/podman/4.2.1 (178 files, 48MB) *
  Poured from bottle on 2022-09-10 at 19:19:22
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/podman.rb
License: Apache-2.0
==> Dependencies
Build: go-md2man ✘, go@1.18 ✘
Required: qemu ✔
==> Options
--HEAD
	Install HEAD version
==> Caveats
zsh completions have been installed to:
  /opt/homebrew/share/zsh/site-functions
==> Analytics
install: 20,426 (30 days), 64,472 (90 days), 211,338 (365 days)
install-on-request: 19,744 (30 days), 62,854 (90 days), 209,364 (365 days)
build-error: 0 (30 days)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Podman is running on macOS running on a Mac with an M1 Pro chip.

Thanks for reaching out!

@ashley-cui @baude can you confirm the observation?

I also got troubled with open files limit in macOS m1 chip. It seems to be related to this issue.
In my case, there are mysql and mongoDB container, they got error (24) : too many open files when initialize.

@gbraad Do you know if this is easily overridden?

Updated: Hmm. It can be reproduced by any files in /Users on podman machine ssh without container environments.


It seems it cannot be reproduced without a file on a mounted volume. I wonder whether it is related to 9p and virtio.

I tested on a MacBook Pro with Intel Chip:

Darwin ****** 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct  9 20:14:54 PDT 2022;
root:xnu-8792.41.9~2/RELEASE_X86_64 x86_64

The mount info:

[root@b30cfa454a1f code]# cat /proc/self/mountinfo | grep $PWD
754 701 0:44 /****** /mnt/code rw,relatime - 9p vol0 rw,sync,dirsync,access=client,trans=virtio

The test file on a volume:

[root@b30cfa454a1f code]# ./nofile /mnt/code/test-file
RLIMIT_NOFILE: current: 1048576, max: 1048576
Error after 64 files: Too many open files (errno 24)

The test file on a writable layer instead of a volume:

[root@b30cfa454a1f code]# ./nofile /root/test-file
RLIMIT_NOFILE: current: 1048576, max: 1048576
Error after 1048573 files: Too many open files (errno 24)
[root@b30cfa454a1f code]#

I searched the linux kernel (fs/9p and net/9p) for EMFILE and found nothing besides rlimit. There are no something like "fd limit on the mount scope".

I have made some experiments by dtruss and lsof also. Now I believe this problem is caused by macOS.

The qemu-system-x86_64 is restricted to open file descriptors between 192r to 255r. The QEMU (9p server) itself receives the errno EMFILE and passes it to the guest machine.

A workaround is podman machine stop && ulimit -n unlimited && podman machine start.

A friendly reminder that this issue had no activity for 30 days.

Perhaps if we move to MAC native virtualization this will be fixed.

A friendly reminder that this issue had no activity for 30 days.

Perhaps if we move to MAC native virtualization this will be fixed.

By using something like VZVirtioFileSystemDeviceConfiguration to replace the 9p file system? Is this in the roadmap of podman machine?

Yes this is on the roadmap.

Running into this while trying to build Node.js / Go apps in a container with a large git repo shared from the host to the container. My options seem to be:

  • Switch to Docker Desktop (or a another Linux containers on macOS solution) where this issue doesn't exist
  • Override Podman Desktop's handling of the machine / VM runtime to implement a ulimit workaround like the one documented here: #16106 (comment)
  • Wait for Podman (Desktop) to implement Apple's Virtualization Framework / virtiofs

Does that sound about right?

Perhaps if we move to MAC native virtualization this will be fixed.

I seem to recall running into open file limits despite using virtiofs when I was experimenting with it last year. A lot has changed since the last time I touched the virtualization framework though.

If for some reason virtiofs would bypass user limits without podman explicitly setting ulimit on start, I would call it a bug as well as a security issue.

If using Podman with VS Code, VS Code will happily open enough files to almost reach this limit before you start doing anything, so it's easy for a small build job to run out of open files. It would be nice if podman machine start would set the limit itself.

What limit should it set?

I set it to unlimited. As I recall, XNU has three fd limits:

  • The one set in ulimit.
  • A per-UID limit set by sysctl
  • A global limit set by sysctl.

I think this was the first bug I ever reported in OS X: they didn’t change the latter two since they were set in OPENSTEP 4, so they were low enough to break things if you had a lot of apps open. Around 10.4, they increased both and (I think) made them scale with available memory, so even if ulimit is set to unlimited there are two later limits that prevent the system exhausting kernel resources with too many fds.

Seems reasonable, if a non-admin user can do this.
@baude @ashley-cui WDYT?

I was quite surprised that you can increase ulimit values, but it seems that you can, at least for this one. I am an admin user, but I didn't privilege elevate. Not sure if it's disallowed for non-admin users.

baude commented

if i understand this issue (and maybe i dont fully grok it), I think #20612 might resolve this when all the applehv work is done in FCOS and it also merges. Do folks agree?

Checked as of commit 111b233 (2 hours ago). Seems to still be unresolved.

I thought qemu needed it's ulimit increased to resolve this, not the services inside the VM.


Update to help illustrate the issue:

The problem is that the virtiofs server provided by qemu is inheriting the ulimit of the user launching it.

This is my default ulimit:

% ulimit -n
256

% ./bin/darwin/podman run --rm -it -v $HOME/tmp-data-dir:/data-dir ubuntu:22.04

root@86f58d98b8fa:/data-dir# ./nofile /data-dir/test-file
RLIMIT_NOFILE: current: 524288, max: 524288
Error after 70 files: Too many open files (errno 24)

Before starting my podman machine, I increase it to 2000:

% ulimit -n 2000
% ulimit -n
2000


% ./bin/darwin/podman machine init --cpus 4 -m 8188 --now --volume-driver virtfs           
...

% ./bin/darwin/podman run --rm -it -v $HOME/tmp-data-dir:/data-dir ubuntu:22.04 

root@6d74215c310c:/# cd data-dir/
root@6d74215c310c:/data-dir# apt update && apt install -y gcc
root@6d74215c310c:/data-dir# gcc -o nofile nofile.c
root@6d74215c310c:/data-dir# ls -lah
total 16K
drwxr-xr-x. 5 root nogroup  160 Nov  7 19:15 .
dr-xr-xr-x. 1 root root      77 Nov  7 19:14 ..
-rwxr-xr-x. 1 root nogroup 9.0K Nov  7 19:15 nofile
-rw-r--r--. 1 root nogroup  630 Nov  7 18:45 nofile.c
-rw-r--r--. 1 root nogroup    0 Nov  7 18:46 test-file


root@6d74215c310c:/data-dir# ./nofile /data-dir/test-file
RLIMIT_NOFILE: current: 524288, max: 524288



^C^C^C^C^Z^C^C^Z^C^C^C^Z^C^Z^C^C
^ --- This ends up hanging, because it's in D state and isn't fast enough reach 2000 files open at once

root@6d74215c310c:/data-dir# jobs -l
root@6d74215c310c:/data-dir# 
exit

% ./bin/darwin/podman machine stop
Waiting for VM to exit...
Machine "podman-machine-default" stopped successfully


% ./bin/darwin/podman machine rm  

In contrast, if I lower the open file count to 80 (tried 50, but it was too low), we see that the number of open files reached is just 21

% ulimit -n                     
2000

% ulimit -n 80
% ulimit -n   
80

% ./bin/darwin/podman machine stop                                              
Waiting for VM to exit...
Machine "podman-machine-default" stopped successfully

% ./bin/darwin/podman machine start
Starting machine "podman-machine-default"
Waiting for VM ...
Mounting volume... /Users:/Users
Mounting volume... /private:/private
Mounting volume... /var/folders:/var/folders

% ./bin/darwin/podman run --rm -it -v $HOME/tmp-data-dir:/data-dir ubuntu:22.04 

root@0f00a3296fe5:/data-dir# ./nofile /data-dir/test-file
RLIMIT_NOFILE: current: 524288, max: 524288
Error after 21 files: Too many open files (errno 24)

The fix for this is to update the open file limit like this:

err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &syscall.Rlimit{
	Max: syscall.RLIM_INFINITY,
	Cur: syscall.RLIM_INFINITY,
})
if err != nil {
	fmt.Println("Error Setting Rlimit ", err)
}

It needs to happen before the qemu process is started:

err := cmd.Start()

What I did for testing was probably results in the limit always being updated, but it does work. I added an init function here for to test the ulimit changes in Go: https://github.com/containers/podman/blob/main/pkg/machine/qemu/machine_unix.go#L14

func init() {
	err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &syscall.Rlimit{
		Max: syscall.RLIM_INFINITY,
		Cur: syscall.RLIM_INFINITY,
	})
	if err != nil {
		fmt.Println("Error Setting Rlimit ", err)
	}
}

Please open a PR to make this change.

Limits are a security/safety feature. The suggestion I made is too broadly scoped. I will follow up in a few hours with a PR that is more narrowly scoped to address this issue.

I believe the change final change should only affect MacOS, FreeBSD, and Linux when qemu is being used to serve files, using virtiofs or p9.

Also... I'm starting to question if this should actually be addressed in qemu instead. I can't recall ever running into issues with the p9 server in qemu, but that's because p9 is functionally different than virtiofs...

Feedback on this is welcome, while I put together my PR today. 🤔

It's not exactly what I wanted to do, but this is what I landed on for resolving this: #20643

Merge in #20643 fixes this.

@mortie @protosam Can this be closed now?

@sanmai-NL as far as I can tell, this is solved. So I want to say: yes.

I still end up with an error when doing an NPM install, might be doing something wrong, I am on an intel mac 2.6 GHz 6-Core Intel Core i7 with sonoma

npm ERR! EMFILE: too many open files, open
podman --version
podman version 4.9.1
scratch % ulimit -n
unlimited

scratch % podman machine init --image-path next --cpus 10 --memory 10240 --now --volume-driver virtfs

scratch % podman run --rm -it -v $HOME/scratch:/data-dir ubuntu:22.04
root@2dd76789a1d6:/# ulimit -n
524288

@plessbd use the nolimit flag on your podman run --ulimit=nofile=1024:1024. The problem was that the host layer was preventing the container from opening the number of files it's ulimit was set to. The test can open 524288 out of 524288 files permitted by the ulimit, instead of being throttled to my user's limit of 256.

@sanmai-NL upon testing today, the PR I made prior seems to work. I upgraded to version 4.9.1 with homebrew.

@plessbd use the nolimit flag on your podman run --ulimit=nofile=1024:1024. The problem was that the host layer was preventing the container from opening the number of files it's ulimit was set to. The test can open 524288 out of 524288 files permitted by the ulimit, instead of being throttled to my user's limit of 256.

That worked! Thank you

Technically since I was using podman-compose I set the default for everything (better to do it in the compose file, but well I was lazy for testing:

cat ~/.local/share/containers/containers.conf
[containers]
default_ulimits = [
 "nofile=65535:65535",
]

@sanmai-NL I still think we're good to close the issue.