mediocregopher/radix

Connection pool panic (v4)

Closed this issue · 12 comments

Hello, running into a connection pool panic with the pool size set to 30

panic: pushing onto front at 29 but it already has an element
goroutine 11 [running]:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront(0xc000102960, 0xc0011cae20)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool_conn_coll.go:84 +0x11f
github.com/mediocregopher/radix/v4.(*pool).putConn.func1(0xc000100b1c, 0xc000639b01)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool.go:387 +0x3f
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock(0xc000100ae0, 0xc00a4e1e58, 0x0, 0x0)
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:116 +0xb9
github.com/mediocregopher/radix/v4.(*pool).putConn(0xc000144120, 0xc0011cae20, 0xc000639b60)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool.go:386 +0x65
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1(0xc49d00, 0xc00010c240)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool.go:331 +0x347
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1(0xc00000c280, 0xc000100ae0)
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:45 +0x3c
created by github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:44 +0x66

Hi @kollektiv , thanks for testing out v4! I just pushed a fix to the poolConnColl where it's internal state would become inconsistent in certain circumstances, specifically when a connection was being discarded due to having errored. So that may have been the cause of your panic here, but let me know if you're still seeing it even with this fix.

I believe this bug also caused your #253 issue.

Finally, I think your #255 issue is likely the same as #252, which I also just pushed a fix for. So if you could retest and let me know if you're seeing any of these issues still that'd be great, and if not please close the relevant issues so I know it's not still a problem. Thanks!

I have this same crash on the updated code (github.com/mediocregopher/radix/v4@9b8ee0e8032b6fe2a8649824cbfce11d00cd8774). I'm not having any luck creating a minimal reproduction right now, but here's what I can give you.

I am running a mixture of Cmd, FlatCmd and EvalScript from a pool of the default size (4). This is coming from a test in a suite of integration tests, and only fails when this test is run in the suite, it does not fail when isolated. All of my testing is happy path, and I am not (intentionally) causing Redis errors.

Panic

panic: pushing onto front at 0 but it already has an element

goroutine 51 [running]:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront(0xc00068e930, 0xc0003fdcc0)
	<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool_conn_coll.go:85 +0x22d
github.com/mediocregopher/radix/v4.(*pool).putConn.func1(0x0, 0x0)
	<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:387 +0x65
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock(0xc000045140, 0xc000513d98, 0x0, 0x0)
	<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:116 +0xf2
github.com/mediocregopher/radix/v4.(*pool).putConn(0xc00044a240, 0xc0003fdcc0, 0xc000a84700)
	<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:386 +0x89
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1(0x1fb9140, 0xc0002eb500)
	<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:331 +0x545
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1(0xc0009d4200, 0xc000045140)
	<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45 +0x48
created by github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run
	<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44 +0x6b

Go-routine Dump (Unique stacks) (I've removed everything that isn't from radix)

Goroutine 14
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:67)
  github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 15
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
  github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
  github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 28
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*conn).EncodeDecode (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:135)
  github.com/mediocregopher/radix/v4.(*poolConn).EncodeDecode (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:34)
  github.com/mediocregopher/radix/v4.(*evalAction).Perform.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:461)
  github.com/mediocregopher/radix/v4.(*evalAction).Perform (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:464)
  github.com/mediocregopher/radix/v4.(*poolConn).Do (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:45)
  github.com/mediocregopher/radix/v4.(*pool).useSharedConn.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:407)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithRLock (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:104)
  github.com/mediocregopher/radix/v4.(*pool).useSharedConn (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:400)
  github.com/mediocregopher/radix/v4.(*pool).Do (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:429)
  <snip, code under test>
  github.com/stretchr/testify/suite.(*Suite).Run.func1 (<snip>/go/pkg/mod/github.com/stretchr/testify@v1.6.1/suite/suite.go:77)
  testing.tRunner (/usr/local/Cellar/go/1.15.6/libexec/src/testing/testing.go:1123)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: testing.(*T).Run (/usr/local/Cellar/go/1.15.6/libexec/src/testing/testing.go:1168)

Goroutine 41
  runtime.fatalpanic (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:1189)
  runtime.gopanic (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:1064)
  github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool_conn_coll.go:85)
  github.com/mediocregopher/radix/v4.(*pool).putConn.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:387)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:116)
  github.com/mediocregopher/radix/v4.(*pool).putConn (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:386)
  github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:331)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 43
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*pool).atIntervalDo.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:291)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 46
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:67)
  github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 47
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
  github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
  github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 59
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:307)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 61
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*pool).atIntervalDo.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:291)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 71
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.netpollblock (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/netpoll.go:438)
  internal/poll.runtime_pollWait (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/netpoll.go:222)
  internal/poll.(*pollDesc).wait (/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:87)
  internal/poll.(*pollDesc).waitRead (/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:92)
  internal/poll.(*FD).Read (/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_unix.go:159)
  net.(*netFD).Read (/usr/local/Cellar/go/1.15.6/libexec/src/net/fd_posix.go:55)
  net.(*conn).Read (/usr/local/Cellar/go/1.15.6/libexec/src/net/net.go:182)
  github.com/mediocregopher/radix/v4.(*connAddrWrap).Read (<autogenerated>:1)
  bufio.(*Reader).fill (/usr/local/Cellar/go/1.15.6/libexec/src/bufio/bufio.go:101)
  bufio.(*Reader).Peek (/usr/local/Cellar/go/1.15.6/libexec/src/bufio/bufio.go:139)
  github.com/mediocregopher/radix/v4/resp/resp3.discardAttribute (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:149)
  github.com/mediocregopher/radix/v4/resp/resp3.peekAndAssertPrefix (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:183)
  github.com/mediocregopher/radix/v4/resp/resp3.readAndAssertPrefix (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:217)
  github.com/mediocregopher/radix/v4/resp/resp3.unmarshalAggHeader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:892)
  github.com/mediocregopher/radix/v4/resp/resp3.(*ArrayHeader).UnmarshalRESP (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:945)
  github.com/mediocregopher/radix/v4.Tuple.UnmarshalRESP (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:327)
  github.com/mediocregopher/radix/v4.(*Tuple).UnmarshalRESP (<autogenerated>:1)
  github.com/mediocregopher/radix/v4/resp/resp3.Unmarshal (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:1661)
  github.com/mediocregopher/radix/v4.(*evalAction).UnmarshalRESP (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:455)
  github.com/mediocregopher/radix/v4/resp/resp3.Unmarshal (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:1661)
  github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:85)
  github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 72
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
  github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
  github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 73
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:307)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 75
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*pool).atIntervalDo.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:291)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 94
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:67)
  github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

Goroutine 95
  runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
  runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
  github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
  github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
  github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
  github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
  runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
  created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)

I did have a test that was essentially running 100 EvalScripts in 100 threads crashing for the same reason, but the very recent commits have fixed that

I chucked some printing breakpoints on the modifications of p.s in pool_conn_coll.go just in case this helps, since I think it's some of the logic in there might be at fault. The block of code in this comment is a log of when the lines of code are executed and their stack when they are.

I'm going to guess that it's either:

  • hole in the removal logic. Somehow the Wraparound logic on pool_conn_coll:73 and pushFront aren't playing well together.
  • a hole in the thread safety. The modifications to the slice and p.first updates aren't atomic, so there might be a lock problem running, something running two modifications at the same time. They all seem to be going in through WithLock, and it looks likepool.proc isn't reset so it seems solid. Just a thought.
Breakpoint reached: pool_conn_coll.go:87
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
	github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
	github.com/mediocregopher/radix/v4.poolConfig.new at pool.go:187
	github.com/mediocregopher/radix/v4.PoolConfig.New at pool.go:228
    <SNIP>
Breakpoint reached: pool_conn_coll.go:69
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:69
	github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
	github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
	github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
    <SNIP>
Breakpoint reached: pool_conn_coll.go:73
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:73
	github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
	github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
	github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
    <SNIP>
Breakpoint reached: pool_conn_coll.go:87
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
	github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
	github.com/mediocregopher/radix/v4.poolConfig.new at pool.go:187
	github.com/mediocregopher/radix/v4.PoolConfig.New at pool.go:228
    <SNIP>
Breakpoint reached: pool_conn_coll.go:69
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:69
	github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
	github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
	github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
    <SNIP>
Breakpoint reached: pool_conn_coll.go:73
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:73
	github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
	github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
	github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
    <SNIP>
Breakpoint reached: pool_conn_coll.go:87
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
	github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
	github.com/mediocregopher/radix/v4.poolConfig.new at pool.go:187
	github.com/mediocregopher/radix/v4.PoolConfig.New at pool.go:228
    <SNIP>
Breakpoint reached: pool_conn_coll.go:87
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
	github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
	github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 at pool.go:331
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 at proc.go:45
	runtime.goexit at asm_amd64.s:1374
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run at proc.go:44
Breakpoint reached: pool_conn_coll.go:87
Stack:
	github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
	github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
	github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
	github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 at pool.go:331
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 at proc.go:45
	runtime.goexit at asm_amd64.s:1374
	github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run at proc.go:44

This is followed by the panic in my first comment.

@thessem thanks so much for all that debugging it was very helpful. I noticed that you're getting a lot of remove calls on your poolConnColl, which indicated to me that you're getting a lot of network errors resulting in connection's being closed and remade. Since I couldn't reproduce any errors on the poolConnColl directly I thought that maybe there's a bug in the reconnect logic which is causing too many connections to be made.

Turns out there was a bug, which I just pushed a fix for. I'm a bit concerned about the fact that you're seeing errors in the first place though. Were you seeing network errors (or any other non-redis error) when you ran the test code? Were they expected?

@mediocregopher I hadn't actually thought about it that much! My test clean-up was cancelling the context and then immediately performing a Do that didn't have its error checked (oops), using the cancelled context. This was causing the connection errors (the err was context cancelled, which is what you'd expect). Fixing that bug in my test code prevents this crash.

9f32b1f fixes my problem too.

Given that revelation, here's my reproduction of my crash. Unfortunately it's a little racy, but this crashes 75% of the time for me (but doesn't crash on 9f32b1f, so this is fixed, and I'm only giving you this just in case it satisfies your intellectual curiosity)

package main

import (
	"context"
	"time"

	"github.com/mediocregopher/radix/v4"
)

func main() {
	ctx, cancel := context.WithCancel(context.Background())
	client, err := radix.PoolConfig{
		Size: 1,
	}.New(
		ctx,
		"tcp",
		"localhost:36379",
	)
	if err != nil {
		panic(err)
	}
	go func() {
		_ = client.Do(ctx, radix.Cmd(nil, "BLPOP", "foobar", "2"))
	}()
	go func() {
		cancel()
	}()
	_ = client.Do(ctx, radix.Cmd(nil, "PING"))
	time.Sleep(3*time.Second)
}
panic: pushing onto front at 0 but it already has an element

goroutine 52 [running]:
github.com/mediocregopher/radix.(*poolConnColl).pushFront(0xc00011e600, 0xc00000e180)
	<snip>/go/src/github.com/mediocregopher/radix/pool_conn_coll.go:85 +0x11f
github.com/mediocregopher/radix.(*pool).putConn.func1(0xc0001381bc, 0xc000182401)
	<snip>/go/src/github.com/mediocregopher/radix/pool.go:387 +0x3f
github.com/mediocregopher/radix/internal/proc.(*Proc).WithLock(0xc000138180, 0xc000151e58, 0x0, 0x0)
	<snip>/go/src/github.com/mediocregopher/radix/internal/proc/proc.go:116 +0xb9
github.com/mediocregopher/radix.(*pool).putConn(0xc000144120, 0xc00000e180, 0xc000182480)
	<snip>/go/src/github.com/mediocregopher/radix/pool.go:386 +0x65
github.com/mediocregopher/radix.(*pool).runReconnect.func1(0x11b9ee0, 0xc00014e040)
	<snip>/go/src/github.com/mediocregopher/radix/pool.go:331 +0x347
github.com/mediocregopher/radix/internal/proc.(*Proc).Run.func1(0xc0001880c0, 0xc000138180)
	<snip>/go/src/github.com/mediocregopher/radix/internal/proc/proc.go:45 +0x3c
created by github.com/mediocregopher/radix/internal/proc.(*Proc).Run
	<snip>/go/src/github.com/mediocregopher/radix/internal/proc/proc.go:44 +0x66
exit status 2

Hi @mediocregopher, thank you for pushing the fixes

I tried out 9f32b1f, and ran into this panic. The pool size is configured to be 30

panic: runtime error: index out of range [2] with length 1
goroutine 1126708 [running]:
github.com/mediocregopher/radix/v4.(*cmdAction).MarshalRESP(0xc007994a10, 0xc3f240, 0xc008f4d1c0, 0xc0075c6ea0, 0xc007994a10, 0x1)
/root/app/vendor/github.com/mediocregopher/radix/v4/action.go:256 +0x1aa
github.com/mediocregopher/radix/v4/resp/resp3.Marshal(0xc3f240, 0xc008f4d1c0, 0xabdce0, 0xc007994a10, 0xc0075c6ea0, 0x0, 0x0)
/root/app/vendor/github.com/mediocregopher/radix/v4/resp/resp3/resp.go:1356 +0x1cce
github.com/mediocregopher/radix/v4.(*connWriter).write(0xc005cfdee8, 0xc4af80, 0xc008b85b60, 0xabdce0, 0xc007994a10, 0xabdce0, 0xc007994a10, 0xc0078cc7e0, 0xa7cb01)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:97 +0x177
github.com/mediocregopher/radix/v4.(*connWriter).flush(0xc005cfdee8, 0xc005cfdd90)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:114 +0x150
github.com/mediocregopher/radix/v4.(*connWriter).run(0xc005cfdee8)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:154 +0x236
github.com/mediocregopher/radix/v4.(*conn).writer(0xc00b191680, 0xc4af00, 0xc008f4d180, 0x0)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:53 +0x10b
github.com/mediocregopher/radix/v4.Dialer.Dial.func1(0xc4af00, 0xc008f4d180)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn.go:334 +0x47
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1(0xc0075cdf80, 0xc00b191500)
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:45 +0x3c
created by github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:44 +0x66

Noting that with 9f32b1f I saw #255 reoccur, but not #253

@kollektiv here is the code which appears to be panicking:

    for i := range c.args {                                                      
        err = marshalBlobString(err, w, c.args[i], o)                            
    }

with the panic happening on the c.args[i]. The only way for that to be possible is if c.args is being changed from another go-routine during that loop. Can you confirm that you're not using the same Cmd/FlatCmd instance more than once? Each instance can be passed to a Do only once.

Hey @mediocregopher, I double checked and the same Cmd / FlatCmd is not being used more than once.

They are being reused and stored in a sync.Pool in radix's action.go. Perhaps they could be reused from some other issue?

@kollektiv I tried to repro via this benchmark:

func BenchmarkPool(b *B) {
	ctx := context.Background()
	pool, err := (PoolConfig{}).New(ctx, "tcp", "127.0.0.1:6379")
	if err != nil {
		panic(err)
	}

	echo := NewEvalScript(`return ARGV[1]`)

	b.ResetTimer()
	b.RunParallel(func(pb *PB) {
		for pb.Next() {
			var err error
			switch rand.Intn(9) {
			case 0:
				var res string
				err = pool.Do(ctx, Cmd(&res, "ECHO", fmt.Sprint(time.Now().UnixNano())))
			case 1:
				var res string
				err = pool.Do(ctx, FlatCmd(&res, "ECHO", time.Now().UnixNano()))
			case 2:
				err = pool.Do(ctx, Cmd(nil, "ECHO", fmt.Sprint(time.Now().UnixNano())))
			case 3:
				err = pool.Do(ctx, FlatCmd(nil, "ECHO", time.Now().UnixNano()))
			case 4:
				var res string
				err = pool.Do(ctx, echo.Cmd(&res, nil, fmt.Sprint(time.Now().UnixNano())))
			case 5:
				var res string
				err = pool.Do(ctx, echo.FlatCmd(&res, nil, time.Now().UnixNano()))
			case 6:
				err = pool.Do(ctx, echo.Cmd(nil, nil, fmt.Sprint(time.Now().UnixNano())))
			case 7:
				err = pool.Do(ctx, echo.FlatCmd(nil, nil, time.Now().UnixNano()))
			case 8:
				var res string
				p := NewPipeline()
				p.Append(Cmd(nil, "ECHO", fmt.Sprint(time.Now().UnixNano())))
				p.Append(FlatCmd(&res, "ECHO", time.Now().UnixNano()))
				err = pool.Do(ctx, p)
			}

			if err != nil {
				b.Fatal(err)
			}
		}
	})
}

Basically doing tons of different kinds of commands in parallel on the same pool. I didn't run into any panics or other errors. Which isn't to say the problem is definitely in your code, but I'm not able to reproduce the problem. Can you provide a code sample which reproduces the error?

@kollektiv I'm going to close this issue for now. If you're still seeing this issue I would very much like to hear about it so please re-open in that case. Thanks for trying out v4!