golang/go

runtime: SIGSEGV in mapassign_fast64 during cmd/vet

myitcv opened this issue · 12 comments

What version of Go are you using (go version)?

$ go version
go version devel +0ac8739ad5 Mon Nov 18 15:11:03 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/myitcv/gostuff/src/github.com/myitcv/govim/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build967409403=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I just got a random failure running tests on govim:

$ go test -short -count=1 ./...
unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x410fff]

goroutine 1 [running]:
runtime.throw(0x760145, 0x5)
        /home/myitcv/dev/go/src/runtime/panic.go:1106 +0x72 fp=0xc000150c18 sp=0xc000150be8 pc=0x4316b2
runtime.sigpanic()
        /home/myitcv/dev/go/src/runtime/signal_unix.go:674 +0x3cc fp=0xc000150c48 sp=0xc000150c18 pc=0x446b9c
runtime.mapassign_fast64(0x705ee0, 0x637469796d2f656d, 0x0, 0x7f256f939698)
        /home/myitcv/dev/go/src/runtime/map_fast64.go:100 +0x2f fp=0xc000150c88 sp=0xc000150c48 pc=0x410fff
go/internal/gcimporter.iImportData(0xc0000fa900, 0xc0002e7e30, 0xc000480001, 0x6dd16, 0x7fdff, 0xc000016ddd, 0x3, 0x0, 0x0, 0x0, ...)
        /home/myitcv/dev/go/src/go/internal/gcimporter/iimport.go:112 +0x4a8 fp=0xc000150fb0 sp=0xc000150c88 pc=0x65cb38
go/internal/gcimporter.Import(0xc0000fa900, 0xc0002e7e30, 0xc000016ddd, 0x3, 0x0, 0x0, 0xc0002f0610, 0x0, 0x0, 0x0)
        /home/myitcv/dev/go/src/go/internal/gcimporter/gcimporter.go:159 +0x4ab fp=0xc0001511a8 sp=0xc000150fb0 pc=0x65bedb
go/importer.(*gcimports).ImportFrom(0xc0002f2720, 0xc000016ddd, 0x3, 0x0, 0x0, 0x0, 0xc00009b8a8, 0x72ff60, 0xc00009b8a0)
        /home/myitcv/dev/go/src/go/importer/importer.go:102 +0x7c fp=0xc000151208 sp=0xc0001511a8 pc=0x6645ac
go/importer.(*gcimports).Import(0xc0002f2720, 0xc000016ddd, 0x3, 0x3, 0xc00009b928, 0x70cc01)
        /home/myitcv/dev/go/src/go/importer/importer.go:95 +0x50 fp=0xc000151260 sp=0xc000151208 pc=0x6644f0
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.run.func2(0xc000016ef1, 0x3, 0x70c680, 0xc000091e01, 0x0)
        /home/myitcv/dev/go/src/cmd/vendor/golang.org/x/tools/go/analysis/unitchecker/unitchecker.go:221 +0x95 fp=0xc0001512c8 sp=0xc000151260 pc=0x6773d5
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.importerFunc.Import(0xc0002f2740, 0xc000016ef1, 0x3, 0x0, 0x0, 0x768b00)
        /home/myitcv/dev/go/src/cmd/vendor/golang.org/x/tools/go/analysis/unitchecker/unitchecker.go:396 +0x3a fp=0xc000151300 sp=0xc0001512c8 pc=0x676f4a
go/types.(*Checker).importPackage(0xc000091440, 0x28, 0xc000016ef1, 0x3, 0xc00001e1c0, 0x63, 0x0)
        /home/myitcv/dev/go/src/go/types/resolver.go:158 +0x602 fp=0xc0001513e8 sp=0xc000151300 pc=0x625242
go/types.(*Checker).collectObjects(0xc000091440)
        /home/myitcv/dev/go/src/go/types/resolver.go:253 +0x8c6 fp=0xc000151988 sp=0xc0001513e8 pc=0x625c96
go/types.(*Checker).checkFiles(0xc000091440, 0xc0002c6180, 0x9, 0x10, 0x0, 0x0)
        /home/myitcv/dev/go/src/go/types/check.go:253 +0x95 fp=0xc0001519d8 sp=0xc000151988 pc=0x608955
go/types.(*Checker).Files(...)
        /home/myitcv/dev/go/src/go/types/check.go:246
go/types.(*Config).Check(0xc0002ef000, 0xc0000240f0, 0x49, 0xc0000fa900, 0xc0002c6180, 0x9, 0x10, 0xc0002dfe00, 0x0, 0x16, ...)
        /home/myitcv/dev/go/src/go/types/api.go:348 +0x134 fp=0xc000151a48 sp=0xc0001519d8 pc=0x5fd654
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.run(0xc0000fa900, 0xc000094dc0, 0xc0000fa7c0, 0x6, 0x8, 0x2e746c7573, 0x7de5c0, 0x707de0, 0xc0000169f8, 0xc0000a5c10)
        /home/myitcv/dev/go/src/cmd/vendor/golang.org/x/tools/go/analysis/unitchecker/unitchecker.go:235 +0x404 fp=0xc000151bf0 sp=0xc000151a48 pc=0x676514
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.Run(0x7ffc676aac5f, 0x23, 0xc0000fa7c0, 0x6, 0x8)
        /home/myitcv/dev/go/src/cmd/vendor/golang.org/x/tools/go/analysis/unitchecker/unitchecker.go:131 +0x113 fp=0xc000151eb8 sp=0xc000151bf0 pc=0x675993
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.Main(0xc0000fa7c0, 0x6, 0x8)
        /home/myitcv/dev/go/src/cmd/vendor/golang.org/x/tools/go/analysis/unitchecker/unitchecker.go:118 +0x25f fp=0xc000151f40 sp=0xc000151eb8 pc=0x6756af
main.main()
        /home/myitcv/dev/go/src/cmd/vet/main.go:35 +0x2bd fp=0xc000151f88 sp=0xc000151f40 pc=0x6bf7ad
runtime.main()
        /home/myitcv/dev/go/src/runtime/proc.go:203 +0x212 fp=0xc000151fe0 sp=0xc000151f88 pc=0x433b72
runtime.goexit()
        /home/myitcv/dev/go/src/runtime/asm_amd64.s:1375 +0x1 fp=0xc000151fe8 sp=0xc000151fe0 pc=0x45f5f1

What did you expect to see?

No panic

What did you see instead?

Panic

cc @randall77 @bcmills @jayconrod

This is coming from the cmd/vet subprocess.

The failing line is here:

p.typCache[uint64(i)] = pt

That map is a field on an unshared, local struct of type iimporter, and is initialized unconditionally just above:

typCache: make(map[uint64]types.Type),

This looks like a compiler or runtime bug to me.

CC @mdempsky @aclements @mknyszek @ianlancetaylor

Marking as release-blocker to at least triage before 1.14. (If we understand the root cause better, we can reprioritize as appropriate.)

There have been a bunch of bug reports recently that all smell like memory corruption. This one, #35621, #35326, #35592, #35658.

I compiled cmd/vet at the indicating commit (0ac8739) and got the exact same binary, as far as I can tell. The assembly language, with the indicated SEGV is here:

   0x0000000000410ff1 <+33>:    mov    0x48(%rsp),%rax
   0x0000000000410ff6 <+38>:    test   %rax,%rax
   0x0000000000410ff9 <+41>:    je     0x4112f3 <runtime.mapassign_fast64+803>
   0x0000000000410fff <+47>:    movzbl 0x8(%rax),%ecx     <====================SEGV

So we are loading the argument h (type hmap) into %rax and jumping to a panic if it is nil/zero (first three assembly instructions). But then when we do a load through (%rax), we are getting a SEGV and the address for the SEGV is indicated as 0 ("unexpected fault address 0x0")

@aclements Any chance that preemption is happening between these instructions, and not restoring the %rax register (i.e. changing it from non-zero to zero)? Just a thought, since I thought that there was still no pre-emption in runtime code.

Otherwise, this is pretty mysterious, since the map was just initialized above, as pointed out by @bcmills

One other thing to note is that the h arg of runtime.mapassign_fast64 in the stacktrace looks like a bogus pointer (I think) -- 0x637469796d2f656d. But I'm not sure these stack args are always right during a panic, etc. But it is definitely not zero.

Just to point out one thing that I have observed as an uninformed reporter of these bugs: I only ever see these problems immediately after (~1 sec) starting a command, e.g. go test. That is to say if I don't see an initial error (like any of those that I have reported), I won't see one.

@myitcv, what kernel version and distro are you running?

@danscales, failing to restore rax seems really unlikely. I think this is a cascade from some earlier corruption. Though my hunch is that earlier corruption has to do some register corruption.

@aclements - you have the correct details for me in #35326 (comment).

If it's relevant, I'm running this in a VMWare Fusion virtual host atop macOS 10.12.6. I can of course provide any more details you need.

Ah, thanks. I'm losing track of who's reported what. I'll add this to the super-bug (#35777).

I suspect the VMWare isn't relevant, but that's good to know.

Any chance you're able to reproduce this?

Unfortunately not. All of the instances I have reported have been totally random and unreproducible.

The other "feature" I observed was noted in #35689 (comment). i.e. I have only seen issues during the compile step of running go test. Once everything is compiled and running, no other observed runtime issues (although I will say the tests in question are exactly heavily stressing the Go runtime).

Just as a follow up: setting GOCACHE=$(mktemp -d) does allow me to relatively reliably reproduce a variant of the version skew issue, I haven't been able to reproduce one of these "others".

Thanks. Since this particular failure doesn't seem to be reproducible, closing in favor of the super-bug.