macOS: The limit of open files in mounted volumes is 64, which is incredibly low
mortie opened this issue · 31 comments
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Volumes mounted with -v
have a max open file count of 64. This is way too low to be useful.
Steps to reproduce the issue:
mkdir data-dir
podman run --rm -it -v $(pwd)/data-dir:/data-dir ubuntu:22.04
- Run some program which tries to open a bunch of files in
/data-dir
.
Here's some source code I wrote for testing (and which also reports its resource limits, for good measure):
// nofile.c
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/resource.h>
int main(int argc, char **argv) {
if (argc != 2) {
printf("Usage: %s <file>\n", argv[0]);
}
struct rlimit rlim;
if (getrlimit(RLIMIT_NOFILE, &rlim) < 0) {
perror("getrlimit");
} else {
printf("RLIMIT_NOFILE: current: %lu, max: %lu\n", rlim.rlim_cur, rlim.rlim_max);
}
int n = 0;
while (fopen(argv[1], "r")) n += 1;
printf("Error after %i files: %s (errno %i)\n", n, strerror(errno), errno);
return 0;
}
If we compile and run that:
apt update && apt install gcc
gcc -o nofile nofile.c
touch /data-dir/test-file
./nofile /data-dir/test-file
Describe the results you received:
The program is only able to open 64 or so files.
Describe the results you expected:
The program should've been able to open a whole lot more than 64. I don't know what would be a good default, but 1048576 (which is used as the default resource limit for everything outside of volumes) seems like a good number.
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version
:
Client: Podman Engine
Version: 4.2.1
API Version: 4.2.1
Go Version: go1.18.6
Built: Tue Sep 6 21:16:02 2022
OS/Arch: darwin/arm64
Server: Podman Engine
Version: 4.2.1
API Version: 4.2.1
Go Version: go1.18.5
Built: Wed Sep 7 21:59:25 2022
OS/Arch: linux/arm64
Output of podman info
:
host:
arch: arm64
buildahVersion: 1.27.0
cgroupControllers:
- cpuset
- cpu
- io
- memory
- pids
- misc
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.4-2.fc36.aarch64
path: /usr/bin/conmon
version: 'conmon version 2.1.4, commit: '
cpuUtilization:
idlePercent: 98.52
systemPercent: 0.28
userPercent: 1.19
cpus: 8
distribution:
distribution: fedora
variant: coreos
version: "36"
eventLogger: journald
hostname: localhost.localdomain
idMappings:
gidmap: null
uidmap: null
kernel: 5.19.12-200.fc36.aarch64
linkmode: dynamic
logDriver: journald
memFree: 6185467904
memTotal: 8302534656
networkBackend: netavark
ociRuntime:
name: crun
package: crun-1.6-2.fc36.aarch64
path: /usr/bin/crun
version: |-
crun version 1.6
commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
exists: true
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.0-0.2.beta.0.fc36.aarch64
version: |-
slirp4netns version 1.2.0-beta.0
commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
libslirp: 4.6.1
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.3
swapFree: 0
swapTotal: 0
uptime: 2h 38m 57.00s (Approximately 0.08 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
volume:
- local
registries:
search:
- docker.io
store:
configFile: /usr/share/containers/storage.conf
containerStore:
number: 1
paused: 0
running: 1
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphRootAllocated: 106825756672
graphRootUsed: 3416588288
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageCopyTmpDir: /var/tmp
imageStore:
number: 11
runRoot: /run/containers/storage
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 4.2.1
Built: 1662580765
BuiltTime: Wed Sep 7 21:59:25 2022
GitCommit: ""
GoVersion: go1.18.5
Os: linux
OsArch: linux/arm64
Version: 4.2.1
Package info (e.g. output of rpm -q podman
or apt list podman
or brew info podman
):
==> podman: stable 4.2.1 (bottled), HEAD
Tool for managing OCI containers and pods
https://podman.io/
/opt/homebrew/Cellar/podman/4.2.1 (178 files, 48MB) *
Poured from bottle on 2022-09-10 at 19:19:22
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/podman.rb
License: Apache-2.0
==> Dependencies
Build: go-md2man ✘, go@1.18 ✘
Required: qemu ✔
==> Options
--HEAD
Install HEAD version
==> Caveats
zsh completions have been installed to:
/opt/homebrew/share/zsh/site-functions
==> Analytics
install: 20,426 (30 days), 64,472 (90 days), 211,338 (365 days)
install-on-request: 19,744 (30 days), 62,854 (90 days), 209,364 (365 days)
build-error: 0 (30 days)
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
Podman is running on macOS running on a Mac with an M1 Pro chip.
Thanks for reaching out!
@ashley-cui @baude can you confirm the observation?
I also got troubled with open files limit in macOS m1 chip. It seems to be related to this issue.
In my case, there are mysql and mongoDB container, they got error (24) : too many open files when initialize.
Updated: Hmm. It can be reproduced by any files in /Users
on podman machine ssh
without container environments.
It seems it cannot be reproduced without a file on a mounted volume. I wonder whether it is related to 9p and virtio.
I tested on a MacBook Pro with Intel Chip:
Darwin ****** 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct 9 20:14:54 PDT 2022;
root:xnu-8792.41.9~2/RELEASE_X86_64 x86_64
The mount info:
[root@b30cfa454a1f code]# cat /proc/self/mountinfo | grep $PWD
754 701 0:44 /****** /mnt/code rw,relatime - 9p vol0 rw,sync,dirsync,access=client,trans=virtio
The test file on a volume:
[root@b30cfa454a1f code]# ./nofile /mnt/code/test-file
RLIMIT_NOFILE: current: 1048576, max: 1048576
Error after 64 files: Too many open files (errno 24)
The test file on a writable layer instead of a volume:
[root@b30cfa454a1f code]# ./nofile /root/test-file
RLIMIT_NOFILE: current: 1048576, max: 1048576
Error after 1048573 files: Too many open files (errno 24)
[root@b30cfa454a1f code]#
I searched the linux kernel (fs/9p
and net/9p
) for EMFILE
and found nothing besides rlimit
. There are no something like "fd limit on the mount scope".
I have made some experiments by dtruss
and lsof
also. Now I believe this problem is caused by macOS.
The qemu-system-x86_64
is restricted to open file descriptors between 192r
to 255r
. The QEMU (9p server) itself receives the errno EMFILE
and passes it to the guest machine.
A workaround is podman machine stop && ulimit -n unlimited && podman machine start
.
A friendly reminder that this issue had no activity for 30 days.
Perhaps if we move to MAC native virtualization this will be fixed.
A friendly reminder that this issue had no activity for 30 days.
Perhaps if we move to MAC native virtualization this will be fixed.
By using something like VZVirtioFileSystemDeviceConfiguration to replace the 9p file system? Is this in the roadmap of podman machine?
Yes this is on the roadmap.
Running into this while trying to build Node.js / Go apps in a container with a large git repo shared from the host to the container. My options seem to be:
- Switch to Docker Desktop (or a another Linux containers on macOS solution) where this issue doesn't exist
- Override Podman Desktop's handling of the machine / VM runtime to implement a ulimit workaround like the one documented here: #16106 (comment)
- Wait for Podman (Desktop) to implement Apple's Virtualization Framework / virtiofs
Does that sound about right?
Perhaps if we move to MAC native virtualization this will be fixed.
I seem to recall running into open file limits despite using virtiofs when I was experimenting with it last year. A lot has changed since the last time I touched the virtualization framework though.
If for some reason virtiofs would bypass user limits without podman explicitly setting ulimit on start, I would call it a bug as well as a security issue.
If using Podman with VS Code, VS Code will happily open enough files to almost reach this limit before you start doing anything, so it's easy for a small build job to run out of open files. It would be nice if podman machine start would set the limit itself.
What limit should it set?
I set it to unlimited. As I recall, XNU has three fd limits:
- The one set in ulimit.
- A per-UID limit set by sysctl
- A global limit set by sysctl.
I think this was the first bug I ever reported in OS X: they didn’t change the latter two since they were set in OPENSTEP 4, so they were low enough to break things if you had a lot of apps open. Around 10.4, they increased both and (I think) made them scale with available memory, so even if ulimit is set to unlimited there are two later limits that prevent the system exhausting kernel resources with too many fds.
Seems reasonable, if a non-admin user can do this.
@baude @ashley-cui WDYT?
I was quite surprised that you can increase ulimit values, but it seems that you can, at least for this one. I am an admin user, but I didn't privilege elevate. Not sure if it's disallowed for non-admin users.
if i understand this issue (and maybe i dont fully grok it), I think #20612 might resolve this when all the applehv work is done in FCOS and it also merges. Do folks agree?
Checked as of commit 111b233 (2 hours ago). Seems to still be unresolved.
I thought qemu needed it's ulimit increased to resolve this, not the services inside the VM.
Update to help illustrate the issue:
The problem is that the virtiofs server provided by qemu is inheriting the ulimit of the user launching it.
This is my default ulimit:
% ulimit -n
256
% ./bin/darwin/podman run --rm -it -v $HOME/tmp-data-dir:/data-dir ubuntu:22.04
root@86f58d98b8fa:/data-dir# ./nofile /data-dir/test-file
RLIMIT_NOFILE: current: 524288, max: 524288
Error after 70 files: Too many open files (errno 24)
Before starting my podman machine, I increase it to 2000:
% ulimit -n 2000
% ulimit -n
2000
% ./bin/darwin/podman machine init --cpus 4 -m 8188 --now --volume-driver virtfs
...
% ./bin/darwin/podman run --rm -it -v $HOME/tmp-data-dir:/data-dir ubuntu:22.04
root@6d74215c310c:/# cd data-dir/
root@6d74215c310c:/data-dir# apt update && apt install -y gcc
root@6d74215c310c:/data-dir# gcc -o nofile nofile.c
root@6d74215c310c:/data-dir# ls -lah
total 16K
drwxr-xr-x. 5 root nogroup 160 Nov 7 19:15 .
dr-xr-xr-x. 1 root root 77 Nov 7 19:14 ..
-rwxr-xr-x. 1 root nogroup 9.0K Nov 7 19:15 nofile
-rw-r--r--. 1 root nogroup 630 Nov 7 18:45 nofile.c
-rw-r--r--. 1 root nogroup 0 Nov 7 18:46 test-file
root@6d74215c310c:/data-dir# ./nofile /data-dir/test-file
RLIMIT_NOFILE: current: 524288, max: 524288
^C^C^C^C^Z^C^C^Z^C^C^C^Z^C^Z^C^C
^ --- This ends up hanging, because it's in D state and isn't fast enough reach 2000 files open at once
root@6d74215c310c:/data-dir# jobs -l
root@6d74215c310c:/data-dir#
exit
% ./bin/darwin/podman machine stop
Waiting for VM to exit...
Machine "podman-machine-default" stopped successfully
% ./bin/darwin/podman machine rm
In contrast, if I lower the open file count to 80 (tried 50, but it was too low), we see that the number of open files reached is just 21
% ulimit -n
2000
% ulimit -n 80
% ulimit -n
80
% ./bin/darwin/podman machine stop
Waiting for VM to exit...
Machine "podman-machine-default" stopped successfully
% ./bin/darwin/podman machine start
Starting machine "podman-machine-default"
Waiting for VM ...
Mounting volume... /Users:/Users
Mounting volume... /private:/private
Mounting volume... /var/folders:/var/folders
% ./bin/darwin/podman run --rm -it -v $HOME/tmp-data-dir:/data-dir ubuntu:22.04
root@0f00a3296fe5:/data-dir# ./nofile /data-dir/test-file
RLIMIT_NOFILE: current: 524288, max: 524288
Error after 21 files: Too many open files (errno 24)
The fix for this is to update the open file limit like this:
err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &syscall.Rlimit{
Max: syscall.RLIM_INFINITY,
Cur: syscall.RLIM_INFINITY,
})
if err != nil {
fmt.Println("Error Setting Rlimit ", err)
}
It needs to happen before the qemu process is started:
podman/pkg/machine/qemu/machine.go
Line 490 in 111b233
What I did for testing was probably results in the limit always being updated, but it does work. I added an init function here for to test the ulimit changes in Go: https://github.com/containers/podman/blob/main/pkg/machine/qemu/machine_unix.go#L14
func init() {
err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &syscall.Rlimit{
Max: syscall.RLIM_INFINITY,
Cur: syscall.RLIM_INFINITY,
})
if err != nil {
fmt.Println("Error Setting Rlimit ", err)
}
}
Please open a PR to make this change.
Limits are a security/safety feature. The suggestion I made is too broadly scoped. I will follow up in a few hours with a PR that is more narrowly scoped to address this issue.
I believe the change final change should only affect MacOS, FreeBSD, and Linux when qemu is being used to serve files, using virtiofs or p9.
Also... I'm starting to question if this should actually be addressed in qemu instead. I can't recall ever running into issues with the p9 server in qemu, but that's because p9 is functionally different than virtiofs...
Feedback on this is welcome, while I put together my PR today. 🤔
It's not exactly what I wanted to do, but this is what I landed on for resolving this: #20643
@sanmai-NL as far as I can tell, this is solved. So I want to say: yes.
I still end up with an error when doing an NPM install, might be doing something wrong, I am on an intel mac 2.6 GHz 6-Core Intel Core i7 with sonoma
npm ERR! EMFILE: too many open files, open
podman --version
podman version 4.9.1
scratch % ulimit -n
unlimited
scratch % podman machine init --image-path next --cpus 10 --memory 10240 --now --volume-driver virtfs
scratch % podman run --rm -it -v $HOME/scratch:/data-dir ubuntu:22.04
root@2dd76789a1d6:/# ulimit -n
524288
@plessbd use the nolimit flag on your podman run --ulimit=nofile=1024:1024
. The problem was that the host layer was preventing the container from opening the number of files it's ulimit was set to. The test can open 524288 out of 524288 files permitted by the ulimit, instead of being throttled to my user's limit of 256.
@sanmai-NL upon testing today, the PR I made prior seems to work. I upgraded to version 4.9.1
with homebrew.
@plessbd use the nolimit flag on your
podman run --ulimit=nofile=1024:1024
. The problem was that the host layer was preventing the container from opening the number of files it's ulimit was set to. The test can open 524288 out of 524288 files permitted by the ulimit, instead of being throttled to my user's limit of 256.
That worked! Thank you
Technically since I was using podman-compose I set the default for everything (better to do it in the compose file, but well I was lazy for testing:
cat ~/.local/share/containers/containers.conf
[containers]
default_ulimits = [
"nofile=65535:65535",
]
@sanmai-NL I still think we're good to close the issue.