syke99/gworker

Bug: loop variable capture, causes data races and probably wrong behavior

Closed this issue · 7 comments

Here i is not properly captured:

	for i := 0; i < p.size; i++ {
		go func(p *Pool[T, P], doneChan chan struct{}) {
			defer func() {
				doneChan <- struct{}{}
			}()

			p.dataSources.Lock()
			ds := p.dataSources.data[i]
			p.dataSources.Unlock()

			p.removeDataSourceByIndex(i)

			p.workerFunc(ds, params)
		}(p, doneChan)

I found this by running the tests with the data race detector:

% go test -race
This greeting is English
This greeting is Spanish
This greeting is French
This greeting is Mandarin
This greeting is Japanese
This greeting is Italian
==================
WARNING: DATA RACE
Read at 0x00c000206048 by goroutine 20:
  github.com/syke99/gworker.(*Pool[...]).runWithAutoRefill.func1()
      gworker/gworker.go:166 +0xc8
  github.com/syke99/gworker.(*Pool[...]).runWithAutoRefill.func4()
      gworker/gworker.go:172 +0x54

Previous write at 0x00c000206048 by goroutine 18:
  github.com/syke99/gworker.(*Pool[...]).runWithAutoRefill()
      gworker/gworker.go:159 +0x88
  github.com/syke99/gworker.(*Pool[...]).Start()
      gworker/gworker.go:150 +0x510
  github.com/syke99/gworker.TestPool_StartWithAutoRefill()
      gworker/gworker_test.go:136 +0x360
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x180
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x40

Goroutine 20 (running) created at:
  github.com/syke99/gworker.(*Pool[...]).runWithAutoRefill()
      gworker/gworker.go:160 +0x70
  github.com/syke99/gworker.(*Pool[...]).Start()
      gworker/gworker.go:150 +0x510
  github.com/syke99/gworker.TestPool_StartWithAutoRefill()
      gworker/gworker_test.go:136 +0x360
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x180
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1629 +0x40

Goroutine 18 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1629 +0x5b4
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2036 +0x80
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1576 +0x180
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2034 +0x6a8
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1906 +0x8e0
  main.main()
      _testmain.go:53 +0x2b8
==================
This greeting is Japanese
This greeting is Mandarin
This greeting is Italian
This greeting is English
This greeting is Spanish
This greeting is French
--- FAIL: TestPool_StartWithAutoRefill (2.01s)
    testing.go:1446: race detected during execution of test
FAIL
exit status 1
FAIL	github.com/syke99/gworker	4.281s
syke99 commented

great catch, I totally missed that. Thank you so much for finding and reporting it!! I'll get a fix up for it shortly

You're welcome :)

Hey, this class of bug might go away in a future version of Go! golang/go#60078

syke99 commented

Yeah, I remember seeing this exact discussion not long ago!! It would be really cool if it goes through. Thanks again!!

syke99 commented

Wasn't thinking and just committed directly to main, but this bug is now fixed with v0.2.2 and released!!

Related: (*sync.Mutex).TryLock has a warning about being subtle, and not checking its return value is also suspicious.

I've never used it myself.

syke99 commented

Indeed it does and I've not really used it much myself. So I am going to use gworker to write some more robust tests to see if it needs a different approach. But thank you for mentioning this here as a reminder!!

syke99 commented

I still plan to write more robust tests, but I went ahead and implemented a recursive TryLock that, while still has the potential of running into a livelock situation, it's now much less likely to happen than a deadlock