os: Readdirnames throws errors on CIFS
rfjakob opened this issue · 14 comments
What version of Go are you using (go version)?
$ go version go version go1.14.3 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env)?
go env Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/jakob/.cache/go-build" GOENV="/home/jakob/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/jakob/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/jakob/go/src/github.com/rfjakob/gocryptfs/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build655782218=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Run readdirnames.go against a cifs mount:
while sleep 1 ; do ./readdirnames /mnt/synology/public/tmp/g1 ; done
Mount info:
//192.168.0.3/public on /mnt/synology/public type cifs (rw,relatime,vers=3.1.1,cache=strict,username=jakob,uid=1026,forceuid,gid=1026,forcegid,addr=192.168.0.3,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)
Linux kernel:
5.6.13-300.fc32.x86_64
What did you expect to see?
No errors, consistent results, like from the C program getdents.c
$ while sleep 1 ; do ./getdents_c /mnt/synology/public/tmp/g1 ; done
getdents64 fd3: n=4176, errno=0
getdents64 fd3: n=4176, errno=0
getdents64 fd3: n=4176, errno=0
getdents64 fd3: n=4176, errno=0
getdents64 fd3: n=4176, errno=0
getdents64 fd3: n=3192, errno=0
getdents64 fd3: n=0, errno=0
total 24072 bytes
[repeats ad infinitum]
What did you see instead?
$ while sleep 1 ; do ./readdirnames /mnt/synology/public/tmp/g1 ; done
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=694, err=readdirent: no such file or directory
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=346, err=readdirent: no such file or directory
Readdirnames: len=1001, err=<nil>
Readdirnames: len=346, err=readdirent: no such file or directory
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=346, err=readdirent: no such file or directory
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=520, err=readdirent: no such file or directory
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=694, err=readdirent: no such file or directory
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
2020/05/24 23:59:25 os.Open returned err=open /mnt/synology/public/tmp/g1: interrupted system call
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
The problem is easier to trigger when the terminal window is continously resized (this triggers SIGWINCH signals).
This is problably the same underlying issue as in #24015 , and I agree that it looks like a kernel bug.
However, the C program seems to handle this fine, even with window resizing stress. So if glibc has workarounds for this, maybe Go should have them as well.
Downstream issue: rfjakob/gocryptfs#483
This seems like the same issue as #38836. Is it possible for you to check whether the problem is fixed on master? Thanks.
Yes can do. Looks the same on master:
$ go build
$ go version
go version devel +f65ad0dda7 Sun May 24 20:49:55 2020 +0000 linux/amd64
$ while sleep 1 ; do ./readdirnames /mnt/synology/public/tmp/g1 ; done
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
2020/05/25 08:20:36 os.Open returned err=open /mnt/synology/public/tmp/g1: interrupted system call
Readdirnames: len=172, err=readdirent: no such file or directory
Readdirnames: len=868, err=readdirent: no such file or directory
Readdirnames: len=346, err=readdirent: no such file or directory
Readdirnames: len=346, err=readdirent: no such file or directory
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
Readdirnames: len=1001, err=<nil>
^C
I posted a summary of the issue to linux-cifs and linux-kernel a few day ago:
https://lore.kernel.org/linux-cifs/2d735cb666872099b2c4c574e35aaeac2f7356ad.camel@gmail.com/
No reply yet, but I'm pretty sure now that this is a kernel bug. Does it make sense to keep this ticket open for tracking the issue?
@gopherbot remove WaitingForInfo
Change https://golang.org/cl/236997 mentions this issue: os: always check for EINTR in calls to open
I'm going to add a check for EINTR in calls to open. I think that should fix the problem with Go.
Nice, that should fix the open problem. The ENOENT in the middle of readdirent is tricky, though.
Readdirnames: len=868, err=readdirent: no such file or directory
Quite right, sorry, it will only fix the problem with os.Open. I don't know what to do with the readdirent problem, but it sounds like that is a kernel bug, and not something we can avoid in Go. So I'm going to close this issue now. Please comment if you disagree.
Rclone users have been having the same problem (see forum)
Using go1.15rc1 appears to fix the EINTR errors but does not fix the ENOENT errors.
Using export GODEBUG=asyncpreemptoff=1 does fix the problem completely.
So I guess CIFS is incorrectly returning ENOENT instead of EINTR on receipt of a signal?
@rfjakob did you get anywhere with trying to report this to the CIFS developers or upstream? I don't see any replies to your message.
Hi, no, I did not get any response to my posting, unfortunately. I don't really know what going on inside CIFS, but from the debug log I got out of the kernel, this line looks like something goes wrong internally:
[168517.187072] Status code returned 0x80000006 STATUS_NO_MORE_FILES
I don't understand why this does not happen with a C program. I can hammer getdents.c with SIGWINCH signals as much as I want, but I don't get ENOENT errors.
GODEBUG=asyncpreemptoff=1 also seems to fix it here, thanks for the tip!
hammer getdents.c with SIGWINCH signals
The Go runtime uses SIGURG (I believe), so perhaps that's what's tripping CIFS?
I don't understand why this does not happen with a C program. I can hammer getdents.c with SIGWINCH signals as much as I want, but I don't get ENOENT errors.
Note that an EINTR error only occurs if a signal handler is installed using the sigaction system call.
We are often seeing this happening in our system when closing an open File located in mounted Azure File Share(which I believe is CISF system). Our go version is 1.15.x which according this kubernetes/kubernetes#92521 would have the fixed for it.
Now, I confuse whether to use export GODEBUG=asyncpreemptoff=1 or not.
@meetme2meat, can you create a new issue describing your problem, with directions to reproduce it and the error results that you're seeing?