boltdb/bolt

Crash in WSL (Windows Subsystem for Linux)

chrisdostert opened this issue · 6 comments

Only in WSL, while doing a cursor.seek, getting panic w/ unexpected fault address

source code: https://github.com/opctl/opctl/blob/master/util/pubsub/eventRepo.go#L85

full trace:

unexpected fault address 0x7f6bd3d94008
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x7f6bd3d94008 pc=0x52eeb9]

goroutine 14 [running]:
runtime.throw(0xab330c, 0x5)
        /usr/local/go/src/runtime/panic.go:596 +0x95 fp=0xc4203418f8 sp=0xc4203418d8
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:297 +0x28c fp=0xc420341948 sp=0xc4203418f8
github.com/opctl/opctl/vendor/github.com/boltdb/bolt.(*Cursor).search(0xc420341af0, 0xc420341bb0, 0x6, 0x20, 0x4)
        /go/src/github.com/opctl/opctl/vendor/github.com/boltdb/bolt/cursor.go:255 +0x69 fp=0xc420341a10 sp=0xc420341948
github.com/opctl/opctl/vendor/github.com/boltdb/bolt.(*Cursor).seek(0xc420341af0, 0xc420341bb0, 0x6, 0x20, 0x0, 0x0, 0x2, 0x2, 0xc420178110, 0x2, ...)
        /go/src/github.com/opctl/opctl/vendor/github.com/boltdb/bolt/cursor.go:159 +0xb1 fp=0xc420341a60 sp=0xc420341a10
github.com/opctl/opctl/vendor/github.com/boltdb/bolt.(*Bucket).Bucket(0xc420096a98, 0xc420341bb0, 0x6, 0x20, 0xc420341bb0)
        /go/src/github.com/opctl/opctl/vendor/github.com/boltdb/bolt/bucket.go:112 +0x108 fp=0xc420341b20 sp=0xc420341a60
github.com/opctl/opctl/vendor/github.com/boltdb/bolt.(*Tx).Bucket(0xc420096a80, 0xc420341bb0, 0x6, 0x20, 0x6)
        /go/src/github.com/opctl/opctl/vendor/github.com/boltdb/bolt/tx.go:101 +0x4f fp=0xc420341b58 sp=0xc420341b20
github.com/opctl/opctl/util/pubsub.(*eventRepo).Add.func1(0xc420096a80, 0xad6fb8, 0xc420096a80)
        /go/src/github.com/opctl/opctl/util/pubsub/eventRepo.go:60 +0x8a fp=0xc420341bf8 sp=0xc420341b58
github.com/opctl/opctl/vendor/github.com/boltdb/bolt.(*DB).Update(0xc420422f00, 0xc420341c68, 0x0, 0x0)
        /go/src/github.com/opctl/opctl/vendor/github.com/boltdb/bolt/db.go:598 +0x9f fp=0xc420341c48 sp=0xc420341bf8
github.com/opctl/opctl/util/pubsub.(*eventRepo).Add(0xc42035f8c0, 0xc4201c99e0)
        /go/src/github.com/opctl/opctl/util/pubsub/eventRepo.go:68 +0x57 fp=0xc420341c88 sp=0xc420341c48
github.com/opctl/opctl/util/pubsub.(*pubSub).Publish(0xc42035f8f0, 0xc4201c99e0)
        /go/src/github.com/opctl/opctl/util/pubsub/pubSub.go:88 +0x55 fp=0xc420341d20 sp=0xc420341c88
github.com/opctl/opctl/node/core._opCaller.Call(0xda7880, 0xc4201c8180, 0xda9700, 0xc42035f8f0, 0xdadac0, 0xc4201e3220, 0xda4380, 0xc42001a3c0, 0xc420468ae0, 0xc4201dacc0, ...)
        /go/src/github.com/opctl/opctl/node/core/opCaller.go:142 +0x441 fp=0xc420341e40 sp=0xc420341d20
github.com/opctl/opctl/node/core.(*_opCaller).Call(0xc42037c300, 0xc420468ae0, 0xc4201dacc0, 0x20, 0xc420468fc0, 0x2a, 0xc4201dacc0, 0x20, 0xc420469020, 0x0, ...)
        <autogenerated>:21 +0xdc fp=0xc420341ee0 sp=0xc420341e40
github.com/opctl/opctl/node/core._core.StartOp.func1(0xdacd80, 0xc42035e480, 0xdadac0, 0xc4201e3220, 0xda68c0, 0xc42037c300, 0xda6900, 0xc4201e3240, 0xda9700, 0xc42035f8f0, ...)
        /go/src/github.com/opctl/opctl/node/core/startOp.go:40 +0x93 fp=0xc420341f48 sp=0xc420341ee0
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420341f50 sp=0xc420341f48
created by github.com/opctl/opctl/node/core._core.StartOp
        /go/src/github.com/opctl/opctl/node/core/startOp.go:41 +0x349

I actually have the same issue when trying to build and run etcd. I snooped around in the code and its possibly caused by a difference in the handling of growing a memory map under Windows vs Linux. IE in windows the mmap is always redone when its grown. I tried to reproduce that behavior on bash for windows as well, but I got stuck with some other errors at that point so I gave up.

Almost certainly this is a WSL bug not a boltdb bug. Can you file a bug here: https://github.com/Microsoft/BashOnWindows and include an strace?

I managed to make a minimal reproduction case with the following code, and I will file a report with that at Bash On Windows:

package main

import (
        "fmt"
        "log"
        "os"

        "github.com/boltdb/bolt"
)

func main() {
        os.Remove("test.db")
        db, err := bolt.Open("test.db", 0600, nil)
        if err != nil {
                log.Fatal(err)
        }
        defer db.Close()

        db.Update(func(tx *bolt.Tx) error {
                _, err := tx.CreateBucket([]byte("MyBucket"))
                if err != nil {
                        return fmt.Errorf("create bucket: %s", err)
                }
                return err
        })

        db.View(func(tx *bolt.Tx) error {
                b := tx.Bucket([]byte("MyBucket"))

                c := b.Cursor()

                c.Seek([]byte("test"))

                return nil
        })
        os.Remove("test.db")
}

This is fixed in the fall creators update.

confirmed on our side as well; cheers!