tebeka/snowball

Various fatal error panics when using in goroutines

Closed this issue · 3 comments

Hey there. I've been trying to use this snowball stemmer implementation and with linear code it works great. But, unfortunately when I'm trying to wrap it up into goroutines I get these seemingly random panics with different sorts of errors. Perhaps, I'm doing something wrong, but the issue is really easy to reproduce, which I've done by cloning project and writing the following example next to the existent one:

import (
	"fmt"
	"sync"

	"github.com/tebeka/snowball"
	"github.com/zhexuany/wordGenerator"
)

func ExampleStemInGoroutines() {
	var wg sync.WaitGroup

	stemmer, err := snowball.New("english")
	if err != nil {
		fmt.Println("error", err)
		return
	}

	words := wordGenerator.GetWords(500, 20)
	stemmas := make([]*string, len(words))

	for i, word := range words {
		wg.Add(1)

		go func(i int, word string) {
			defer wg.Done()
			stemma := stemmer.Stem(word)
			stemmas[i] = &stemma
		}(i, word)
	}

	wg.Wait()
	fmt.Println("test")

	// Output:
	// test
}

Notice: on a smaller number of words the issues won't reveal themselves, so I had to use github.com/zhexuany/wordGenerator to demonstrate when and how it emerges.

Here are just some errors I'm getting:

fatal error: unexpected signal during runtime execution     
snowball.test(2553,0x700001f1f000) malloc: *** set a breakpoint in malloc_error_break to debug                          
[signal SIGSEGV: segmentation violation code=0x1 addr=0x5afffea pc=0x7fff70574b49]                                      

runtime stack:                                              
runtime.throw(0x416fb4b, 0x2a)                              
        /usr/local/Cellar/go/1.14.1/libexec/src/runtime/panic.go:1114 +0x72                                             
runtime.sigpanic()                                          
        /usr/local/Cellar/go/1.14.1/libexec/src/runtime/signal_unix.go:679 +0x46a
snowball.test(4038,0x70000950a000) malloc: Region cookie corrupted for region 0x9c00000 (value is 7277)[0x9c0407c]                                                                                                                              
snowball.test(4038,0x70000950a000) malloc: *** set a breakpoint in malloc_error_break to debug                                                                                                                                                  
SIGABRT: abort                                                                                                                                                                                                                                  
PC=0x7fff704c633a m=7 sigcode=0     
panic: runtime error: gobytes: length out of range

goroutine 121 [running]:
github.com/tebeka/snowball._Cfunc_GoBytes(...)
        _cgo_gotypes.go:63
github.com/tebeka/snowball.(*Stemmer).Stem.func4(0x5c04338, 0xfffffffe, 0xc000099a00, 0x9, 0x5c04338)
        /Users/smileart/Sync/Projects/snowball/snowball.go:73 +0x59
github.com/tebeka/snowball.(*Stemmer).Stem(0xc00008e020, 0xc000099a00, 0x9, 0x0, 0x0)
        /Users/smileart/Sync/Projects/snowball/snowball.go:73 +0xcb
github.com/tebeka/snowball_test.ExampleStemGoroutines.func1(0xc000098020, 0xc00008e020, 0xc0000cc000, 0x1f4, 0x1f4, 0x56, 0xc000099a00, 0x9)
        /Users/smileart/Sync/Projects/snowball/example_test.go:41 +0x8b
created by github.com/tebeka/snowball_test.ExampleStemGoroutines
        /Users/smileart/Sync/Projects/snowball/example_test.go:38 +0x19a
exit status 2
FAIL    github.com/tebeka/snowball      0.207s

And so on. Also I don't provide all the stack traces cause the issue is really consistent and easy to reproduce with the code I provided above, so I guess it'd be easier for you to get them yourself. I'm not a C expert by any means, but I've seen somewhat similar symptoms discussed in many places (here are just a couple of them: link, link), so my guess would be that either it's my usage or C code which is wrong.

And also here's my Go ENV, JIC:

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/smileart/Library/Caches/go-build"
GOENV="/Users/smileart/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/smileart/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.14.1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14.1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/smileart/Sync/Projects/snowball/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3x/b7d_4f997rxfrv1qb81hscp80000gn/T/go-build699026935=/tmp/go-build -gno-record-gcc-switches -fno-common"

Thanks in advance. And thank you for the project. 👌🙏

Thanks for reporting, I'll try to have a look soon.
IMO it'll be safter to create a stemmer per goroutine or use them in a sync.Pool, I'm not sure the underlying C library is thread (and goroutine) safe.

I moved the stammer creation inside the goroutine and the code works. I don't have plans to make stemmer goroutine safe but I will add this fact to the documentation.

Sure, but if I provide some library which uses this one internally and creates Stemmer in method New, reusing it later on, I'd have to ask the users to avoid using it in goroutines as well. ¯\_(ツ)_/¯ Anyway, thanks for the answer and help. 🤝