racy RLIMIT_NOFILE setting with Go 1.19+
jrivera-px opened this issue · 13 comments
Description
We noticed an issue with 'ubuntu:bionic' where runc does not continually honor "RLIMIT_NOFILE " setting defined in the spec. After configuring the setting to '65536 ' Its keeps flipping between '65536' & '1024' intermittently. Does not appear to be consistent. Looking back, this issue appears to have started happening in runc-1.1.10 and above. Hoping that someone can provide a fix or some insight to why this is happening. The reproduction is using the latest runc-1.1.12
Steps to reproduce the issue
Running on an 'ubuntu:focal (20.04.6 LTS )' server with kernel 5.4.0-164-generic, using bash shell
- As root create 'rootfs' directory: mkdir -p Issue/bionic/rootfs"
- Pull ubuntu:bionic and unpack rootfs: docker export $(docker run -d ubuntu:bionic) |tar -C Issue/bionic/rootfs -x
- Get runc-1.1.12: cd Issue && wget -q https://github.com/opencontainers/runc/releases/download/v1.1.12/runc.amd64 -O runc-1.1.12 && chmod +x runc-1.1.12
- Create initial spec: cd bionic && ../runc-1.1.12 spec
- Update 'RLIMIT_NOFILE' settings in spec : sed -i -e 's/"hard": 1024/"hard": 65536/' -e 's/"soft": 1024/"soft": 65536/' config.json
- In another window on the same server as root in the same directory 'Issue' directory created in step 1 above. Start a monitoring process which will continually exec into the container and display the ulimit 'nofiles' setting. Install 'ts' utility if not present: apt-get install moreutils. Since we are specifying in the configuration that the setting should be '65536' we cut down on the output by searching for 1024. Printing a start date and timestamp to show the time of the flipping happens: date; while true; do sudo $(pwd)/runc-1.1.12 exec -t bionic /bin/sh -c 'ulimit -a' 2>&1 | grep nofiles; sleep 0.1; done | ts '%F %T' | egrep 1024
- Coming back to the original window start the runc bionic container: cd .. && ./runc-1.1.12 run -b $(pwd)/bionic bionic
Describe the results you received and expected
In the monitoring window you initially just see a date, then intermittently you will see lines appear: nofiles 1024
e.g.
date; while true; do sudo $(pwd)/runc-1.1.12 exec -t bionic /bin/sh -c 'ulimit -a' 2>&1 | grep nofiles; sleep 0.1; done | ts '%F %T' | egrep 1024
Wed 07 Feb 2024 07:34:12 PM UTC
2024-02-07 19:35:18 nofiles 1024
2024-02-07 19:35:44 nofiles 1024
2024-02-07 19:36:08 nofiles 1024
2024-02-07 19:37:08 nofiles 1024
I would have expected 'nofiles' to never flip back to 1024 and remain at '65536' so we should never see any output.
What version of runc are you using?
./runc-1.1.12 --version
runc version 1.1.12
commit: v1.1.12-0-g51d5e946
spec: 1.0.2-dev
go: go1.20.13
libseccomp: 2.5.4
Host OS information
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Host kernel information
Linux ip-10-13-108-144.pwx.purestorage.com 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Sorry, I can't reproduce it with your provided step.
date; while true; do sudo /workspaces/runc/runc exec -t test /bin/sh -c 'ulimit -a' 2>&1 | grep nofiles; sleep 0.1; done | ts '%F %T'
I printed all logs out, it looks like:
2024-02-08 02:07:44 nofiles 65536
2024-02-08 02:07:44 nofiles 65536
2024-02-08 02:07:45 nofiles 65536
@lifubang was looking to see if there was any progress made on this issue. We are looking to get an update for this because of vulnerabilities which are being flagged for go1.20.3
Sorry, I just returned from vacation and haven't reproduced your question yet. If you could take the time to point out the problem, it would be greatly appreciated.
I successfully reproduced this problem. I think the root cause is the changes related to rlimit-nofile in the new version of go. I've submitted a PR which should fix this issue.
Reproduced locally, 40 times out of 1 million.
#4237 states this is because of a change in Go 1.21 (which was also backported to Go 1.20.4). Yet I reproduced it with Go 1.19:
$ RUNC=~/git/runc/runc-4239-go1.19; i=0; $RUNC --version && while [ $i -lt 100000 ]; do LIM=$(sudo $RUNC exec bionic sh -c "ulimit -n"); if [ $LIM -ne 65536 ]; then echo "WHOOPSIE (iter $i) got numfile $LIM"; break; fi; ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i"; let i++; done
runc version 1.1.12+dev
commit: v1.0.0-754-gec700396
spec: 1.0.2-dev
go: go1.19.13
libseccomp: 2.5.1
[Tue 02 Apr 2024 06:43:29 PM PDT] 0
[Tue 02 Apr 2024 06:43:30 PM PDT] 100
....
[Tue 02 Apr 2024 06:48:35 PM PDT] 22800
[Tue 02 Apr 2024 06:48:36 PM PDT] 22900
WHOOPSIE (iter 22945) got numfile 1024
This issue states only runc v1.1.10+ is affected. I have reproduced it with earlier version:
unc version 1.1.9
commit: v1.1.9-0-gccaecfcb
spec: 1.0.2-dev
go: go1.19.13
libseccomp: 2.5.1
[sudo] password for kir:
[Tue 02 Apr 2024 07:37:45 PM PDT] 0
[Tue 02 Apr 2024 07:37:47 PM PDT] 100
...
[Tue 02 Apr 2024 07:39:19 PM PDT] 7000
WHOOPSIE (iter 7076) got numfile 1024
Will try older versions of runc and go tomorrow.
Apparently runc version 1.1.0, compiled with go1.17.13, does not have the issue.
I was able to reproduce this with runc-1.1.0 compiled with Go 1.19.13. So the problem is not in runc per se, it is in Go runtime (as pointed out in #4237).
Alas, the fix that I proposed in #4239 is not working for some reason.
Testing whether we have the issue in the main branch (I was thinking we do not, but no longer sure).
Testing whether we have the issue in the main branch (I was thinking we do not, but no longer sure).
Yes we do (with the same script as above):
runc version 1.1.0+dev
commit: v1.1.0-994-g4641f17e
spec: 1.2.0
go: go1.21.7
libseccomp: 2.5.1
[Tue 02 Apr 2024 10:04:19 PM PDT] 0
[Tue 02 Apr 2024 10:04:21 PM PDT] 100
[Tue 02 Apr 2024 10:04:22 PM PDT] 200
...
[Tue 02 Apr 2024 10:05:49 PM PDT] 5000
WHOOPSIE (iter 5089) got numfile 1024
IMHO this is ultimately a Go stdlib bug -- they assume that only the Go process itself will change its own rlimits (and thus use a cached version of the rlimit), but prlimit(2)
lets you set another process's rlimits and it's totally normal to do this (we set the child process's rlimit in the parent process -- set setupRlimits
and the callers in libcontainer/process_linux.go
).
As a workaround, we could try to set our own limits -- but unlike the solution proposed in #4239, we should use the configured limit, not the one it happens to be at the time. But IMHO this is clearly a Go bug...
@kolyshkin This is why updating golang.org/x/sys/unix
didn't help -- the change to golang.org/x/sys/unix
was to make unix.Setrlimit
update the syscall
-cached value, but we are setting the limits using unix.Prlimit
from another process -- IMHO the caching design in the Go stdlib is wrong. If they really want to do their "recover the old values" as well as "automagically bump the soft limit" thing, they should check if the current values have changed since the runtime last bumped the soft limit. This would still have a race (because you cannot atomically cmpxchg the settings, though you could detect an in-flight change with prlimit
since that is a one-shot get/set thing) but it would at least handle prlimit(2)
somewhat...