Connection pool panic (v4)
Closed this issue · 12 comments
Hello, running into a connection pool panic with the pool size set to 30
panic: pushing onto front at 29 but it already has an element
goroutine 11 [running]:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront(0xc000102960, 0xc0011cae20)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool_conn_coll.go:84 +0x11f
github.com/mediocregopher/radix/v4.(*pool).putConn.func1(0xc000100b1c, 0xc000639b01)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool.go:387 +0x3f
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock(0xc000100ae0, 0xc00a4e1e58, 0x0, 0x0)
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:116 +0xb9
github.com/mediocregopher/radix/v4.(*pool).putConn(0xc000144120, 0xc0011cae20, 0xc000639b60)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool.go:386 +0x65
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1(0xc49d00, 0xc00010c240)
/root/app/vendor/github.com/mediocregopher/radix/v4/pool.go:331 +0x347
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1(0xc00000c280, 0xc000100ae0)
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:45 +0x3c
created by github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:44 +0x66
Hi @kollektiv , thanks for testing out v4! I just pushed a fix to the poolConnColl
where it's internal state would become inconsistent in certain circumstances, specifically when a connection was being discarded due to having errored. So that may have been the cause of your panic here, but let me know if you're still seeing it even with this fix.
I believe this bug also caused your #253 issue.
Finally, I think your #255 issue is likely the same as #252, which I also just pushed a fix for. So if you could retest and let me know if you're seeing any of these issues still that'd be great, and if not please close the relevant issues so I know it's not still a problem. Thanks!
I have this same crash on the updated code (github.com/mediocregopher/radix/v4@9b8ee0e8032b6fe2a8649824cbfce11d00cd8774
). I'm not having any luck creating a minimal reproduction right now, but here's what I can give you.
I am running a mixture of Cmd
, FlatCmd
and EvalScript
from a pool of the default size (4). This is coming from a test in a suite of integration tests, and only fails when this test is run in the suite, it does not fail when isolated. All of my testing is happy path, and I am not (intentionally) causing Redis errors.
Panic
panic: pushing onto front at 0 but it already has an element
goroutine 51 [running]:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront(0xc00068e930, 0xc0003fdcc0)
<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool_conn_coll.go:85 +0x22d
github.com/mediocregopher/radix/v4.(*pool).putConn.func1(0x0, 0x0)
<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:387 +0x65
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock(0xc000045140, 0xc000513d98, 0x0, 0x0)
<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:116 +0xf2
github.com/mediocregopher/radix/v4.(*pool).putConn(0xc00044a240, 0xc0003fdcc0, 0xc000a84700)
<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:386 +0x89
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1(0x1fb9140, 0xc0002eb500)
<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:331 +0x545
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1(0xc0009d4200, 0xc000045140)
<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45 +0x48
created by github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run
<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44 +0x6b
Go-routine Dump (Unique stacks) (I've removed everything that isn't from radix)
Goroutine 14
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:67)
github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 15
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 28
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*conn).EncodeDecode (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:135)
github.com/mediocregopher/radix/v4.(*poolConn).EncodeDecode (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:34)
github.com/mediocregopher/radix/v4.(*evalAction).Perform.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:461)
github.com/mediocregopher/radix/v4.(*evalAction).Perform (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:464)
github.com/mediocregopher/radix/v4.(*poolConn).Do (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:45)
github.com/mediocregopher/radix/v4.(*pool).useSharedConn.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:407)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithRLock (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:104)
github.com/mediocregopher/radix/v4.(*pool).useSharedConn (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:400)
github.com/mediocregopher/radix/v4.(*pool).Do (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:429)
<snip, code under test>
github.com/stretchr/testify/suite.(*Suite).Run.func1 (<snip>/go/pkg/mod/github.com/stretchr/testify@v1.6.1/suite/suite.go:77)
testing.tRunner (/usr/local/Cellar/go/1.15.6/libexec/src/testing/testing.go:1123)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: testing.(*T).Run (/usr/local/Cellar/go/1.15.6/libexec/src/testing/testing.go:1168)
Goroutine 41
runtime.fatalpanic (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:1189)
runtime.gopanic (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/panic.go:1064)
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool_conn_coll.go:85)
github.com/mediocregopher/radix/v4.(*pool).putConn.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:387)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:116)
github.com/mediocregopher/radix/v4.(*pool).putConn (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:386)
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:331)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 43
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*pool).atIntervalDo.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:291)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 46
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:67)
github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 47
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 59
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:307)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 61
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*pool).atIntervalDo.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:291)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 71
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.netpollblock (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/netpoll.go:438)
internal/poll.runtime_pollWait (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/netpoll.go:222)
internal/poll.(*pollDesc).wait (/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:87)
internal/poll.(*pollDesc).waitRead (/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_poll_runtime.go:92)
internal/poll.(*FD).Read (/usr/local/Cellar/go/1.15.6/libexec/src/internal/poll/fd_unix.go:159)
net.(*netFD).Read (/usr/local/Cellar/go/1.15.6/libexec/src/net/fd_posix.go:55)
net.(*conn).Read (/usr/local/Cellar/go/1.15.6/libexec/src/net/net.go:182)
github.com/mediocregopher/radix/v4.(*connAddrWrap).Read (<autogenerated>:1)
bufio.(*Reader).fill (/usr/local/Cellar/go/1.15.6/libexec/src/bufio/bufio.go:101)
bufio.(*Reader).Peek (/usr/local/Cellar/go/1.15.6/libexec/src/bufio/bufio.go:139)
github.com/mediocregopher/radix/v4/resp/resp3.discardAttribute (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:149)
github.com/mediocregopher/radix/v4/resp/resp3.peekAndAssertPrefix (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:183)
github.com/mediocregopher/radix/v4/resp/resp3.readAndAssertPrefix (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:217)
github.com/mediocregopher/radix/v4/resp/resp3.unmarshalAggHeader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:892)
github.com/mediocregopher/radix/v4/resp/resp3.(*ArrayHeader).UnmarshalRESP (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:945)
github.com/mediocregopher/radix/v4.Tuple.UnmarshalRESP (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:327)
github.com/mediocregopher/radix/v4.(*Tuple).UnmarshalRESP (<autogenerated>:1)
github.com/mediocregopher/radix/v4/resp/resp3.Unmarshal (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:1661)
github.com/mediocregopher/radix/v4.(*evalAction).UnmarshalRESP (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/action.go:455)
github.com/mediocregopher/radix/v4/resp/resp3.Unmarshal (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/resp/resp3/resp.go:1661)
github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:85)
github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 72
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 73
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:307)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 75
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*pool).atIntervalDo.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/pool.go:291)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 94
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*conn).reader (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:67)
github.com/mediocregopher/radix/v4.(*conn).reader-fm (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:64)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
Goroutine 95
runtime.gopark (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/proc.go:307)
runtime.selectgo (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/select.go:338)
github.com/mediocregopher/radix/v4.(*connWriter).run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:142)
github.com/mediocregopher/radix/v4.(*conn).writer (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn_writer.go:53)
github.com/mediocregopher/radix/v4.Dialer.Dial.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/conn.go:334)
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:45)
runtime.goexit (/usr/local/Cellar/go/1.15.6/libexec/src/runtime/asm_amd64.s:1374)
created at: github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run (<snip>/go/pkg/mod/github.com/mediocregopher/radix/v4@v4.0.0-beta.1.0.20210117001606-9b8ee0e8032b/internal/proc/proc.go:44)
I did have a test that was essentially running 100 EvalScript
s in 100 threads crashing for the same reason, but the very recent commits have fixed that
I chucked some printing breakpoints on the modifications of p.s
in pool_conn_coll.go
just in case this helps, since I think it's some of the logic in there might be at fault. The block of code in this comment is a log of when the lines of code are executed and their stack when they are.
I'm going to guess that it's either:
- hole in the removal logic. Somehow the Wraparound logic on
pool_conn_coll:73
andpushFront
aren't playing well together. - a hole in the thread safety. The modifications to the slice and
p.first
updates aren't atomic, so there might be a lock problem running, something running two modifications at the same time. They all seem to be going in throughWithLock
, and it looks likepool.proc
isn't reset so it seems solid. Just a thought.
Breakpoint reached: pool_conn_coll.go:87
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
github.com/mediocregopher/radix/v4.poolConfig.new at pool.go:187
github.com/mediocregopher/radix/v4.PoolConfig.New at pool.go:228
<SNIP>
Breakpoint reached: pool_conn_coll.go:69
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:69
github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
<SNIP>
Breakpoint reached: pool_conn_coll.go:73
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:73
github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
<SNIP>
Breakpoint reached: pool_conn_coll.go:87
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
github.com/mediocregopher/radix/v4.poolConfig.new at pool.go:187
github.com/mediocregopher/radix/v4.PoolConfig.New at pool.go:228
<SNIP>
Breakpoint reached: pool_conn_coll.go:69
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:69
github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
<SNIP>
Breakpoint reached: pool_conn_coll.go:73
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).remove at pool_conn_coll.go:73
github.com/mediocregopher/radix/v4.(*pool).checkConn.func1 at pool.go:351
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).checkConn at pool.go:350
github.com/mediocregopher/radix/v4.(*pool).useSharedConn at pool.go:410
github.com/mediocregopher/radix/v4.(*pool).Do at pool.go:429
<SNIP>
Breakpoint reached: pool_conn_coll.go:87
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
github.com/mediocregopher/radix/v4.poolConfig.new at pool.go:187
github.com/mediocregopher/radix/v4.PoolConfig.New at pool.go:228
<SNIP>
Breakpoint reached: pool_conn_coll.go:87
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 at pool.go:331
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 at proc.go:45
runtime.goexit at asm_amd64.s:1374
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run at proc.go:44
Breakpoint reached: pool_conn_coll.go:87
Stack:
github.com/mediocregopher/radix/v4.(*poolConnColl).pushFront at pool_conn_coll.go:87
github.com/mediocregopher/radix/v4.(*pool).putConn.func1 at pool.go:387
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).WithLock at proc.go:116
github.com/mediocregopher/radix/v4.(*pool).putConn at pool.go:386
github.com/mediocregopher/radix/v4.(*pool).runReconnect.func1 at pool.go:331
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1 at proc.go:45
runtime.goexit at asm_amd64.s:1374
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run at proc.go:44
This is followed by the panic in my first comment.
@thessem thanks so much for all that debugging it was very helpful. I noticed that you're getting a lot of remove
calls on your poolConnColl, which indicated to me that you're getting a lot of network errors resulting in connection's being closed and remade. Since I couldn't reproduce any errors on the poolConnColl directly I thought that maybe there's a bug in the reconnect logic which is causing too many connections to be made.
Turns out there was a bug, which I just pushed a fix for. I'm a bit concerned about the fact that you're seeing errors in the first place though. Were you seeing network errors (or any other non-redis error) when you ran the test code? Were they expected?
@mediocregopher I hadn't actually thought about it that much! My test clean-up was cancelling the context and then immediately performing a Do
that didn't have its error checked (oops), using the cancelled context. This was causing the connection errors (the err was context cancelled
, which is what you'd expect). Fixing that bug in my test code prevents this crash.
9f32b1f fixes my problem too.
Given that revelation, here's my reproduction of my crash. Unfortunately it's a little racy, but this crashes 75% of the time for me (but doesn't crash on 9f32b1f
, so this is fixed, and I'm only giving you this just in case it satisfies your intellectual curiosity)
package main
import (
"context"
"time"
"github.com/mediocregopher/radix/v4"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
client, err := radix.PoolConfig{
Size: 1,
}.New(
ctx,
"tcp",
"localhost:36379",
)
if err != nil {
panic(err)
}
go func() {
_ = client.Do(ctx, radix.Cmd(nil, "BLPOP", "foobar", "2"))
}()
go func() {
cancel()
}()
_ = client.Do(ctx, radix.Cmd(nil, "PING"))
time.Sleep(3*time.Second)
}
panic: pushing onto front at 0 but it already has an element
goroutine 52 [running]:
github.com/mediocregopher/radix.(*poolConnColl).pushFront(0xc00011e600, 0xc00000e180)
<snip>/go/src/github.com/mediocregopher/radix/pool_conn_coll.go:85 +0x11f
github.com/mediocregopher/radix.(*pool).putConn.func1(0xc0001381bc, 0xc000182401)
<snip>/go/src/github.com/mediocregopher/radix/pool.go:387 +0x3f
github.com/mediocregopher/radix/internal/proc.(*Proc).WithLock(0xc000138180, 0xc000151e58, 0x0, 0x0)
<snip>/go/src/github.com/mediocregopher/radix/internal/proc/proc.go:116 +0xb9
github.com/mediocregopher/radix.(*pool).putConn(0xc000144120, 0xc00000e180, 0xc000182480)
<snip>/go/src/github.com/mediocregopher/radix/pool.go:386 +0x65
github.com/mediocregopher/radix.(*pool).runReconnect.func1(0x11b9ee0, 0xc00014e040)
<snip>/go/src/github.com/mediocregopher/radix/pool.go:331 +0x347
github.com/mediocregopher/radix/internal/proc.(*Proc).Run.func1(0xc0001880c0, 0xc000138180)
<snip>/go/src/github.com/mediocregopher/radix/internal/proc/proc.go:45 +0x3c
created by github.com/mediocregopher/radix/internal/proc.(*Proc).Run
<snip>/go/src/github.com/mediocregopher/radix/internal/proc/proc.go:44 +0x66
exit status 2
Hi @mediocregopher, thank you for pushing the fixes
I tried out 9f32b1f, and ran into this panic. The pool size is configured to be 30
panic: runtime error: index out of range [2] with length 1
goroutine 1126708 [running]:
github.com/mediocregopher/radix/v4.(*cmdAction).MarshalRESP(0xc007994a10, 0xc3f240, 0xc008f4d1c0, 0xc0075c6ea0, 0xc007994a10, 0x1)
/root/app/vendor/github.com/mediocregopher/radix/v4/action.go:256 +0x1aa
github.com/mediocregopher/radix/v4/resp/resp3.Marshal(0xc3f240, 0xc008f4d1c0, 0xabdce0, 0xc007994a10, 0xc0075c6ea0, 0x0, 0x0)
/root/app/vendor/github.com/mediocregopher/radix/v4/resp/resp3/resp.go:1356 +0x1cce
github.com/mediocregopher/radix/v4.(*connWriter).write(0xc005cfdee8, 0xc4af80, 0xc008b85b60, 0xabdce0, 0xc007994a10, 0xabdce0, 0xc007994a10, 0xc0078cc7e0, 0xa7cb01)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:97 +0x177
github.com/mediocregopher/radix/v4.(*connWriter).flush(0xc005cfdee8, 0xc005cfdd90)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:114 +0x150
github.com/mediocregopher/radix/v4.(*connWriter).run(0xc005cfdee8)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:154 +0x236
github.com/mediocregopher/radix/v4.(*conn).writer(0xc00b191680, 0xc4af00, 0xc008f4d180, 0x0)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn_writer.go:53 +0x10b
github.com/mediocregopher/radix/v4.Dialer.Dial.func1(0xc4af00, 0xc008f4d180)
/root/app/vendor/github.com/mediocregopher/radix/v4/conn.go:334 +0x47
github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run.func1(0xc0075cdf80, 0xc00b191500)
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:45 +0x3c
created by github.com/mediocregopher/radix/v4/internal/proc.(*Proc).Run
/root/app/vendor/github.com/mediocregopher/radix/v4/internal/proc/proc.go:44 +0x66
@kollektiv here is the code which appears to be panicking:
for i := range c.args {
err = marshalBlobString(err, w, c.args[i], o)
}
with the panic happening on the c.args[i]
. The only way for that to be possible is if c.args
is being changed from another go-routine during that loop. Can you confirm that you're not using the same Cmd
/FlatCmd
instance more than once? Each instance can be passed to a Do
only once.
Hey @mediocregopher, I double checked and the same Cmd / FlatCmd
is not being used more than once.
They are being reused and stored in a sync.Pool
in radix's action.go
. Perhaps they could be reused from some other issue?
@kollektiv I tried to repro via this benchmark:
func BenchmarkPool(b *B) {
ctx := context.Background()
pool, err := (PoolConfig{}).New(ctx, "tcp", "127.0.0.1:6379")
if err != nil {
panic(err)
}
echo := NewEvalScript(`return ARGV[1]`)
b.ResetTimer()
b.RunParallel(func(pb *PB) {
for pb.Next() {
var err error
switch rand.Intn(9) {
case 0:
var res string
err = pool.Do(ctx, Cmd(&res, "ECHO", fmt.Sprint(time.Now().UnixNano())))
case 1:
var res string
err = pool.Do(ctx, FlatCmd(&res, "ECHO", time.Now().UnixNano()))
case 2:
err = pool.Do(ctx, Cmd(nil, "ECHO", fmt.Sprint(time.Now().UnixNano())))
case 3:
err = pool.Do(ctx, FlatCmd(nil, "ECHO", time.Now().UnixNano()))
case 4:
var res string
err = pool.Do(ctx, echo.Cmd(&res, nil, fmt.Sprint(time.Now().UnixNano())))
case 5:
var res string
err = pool.Do(ctx, echo.FlatCmd(&res, nil, time.Now().UnixNano()))
case 6:
err = pool.Do(ctx, echo.Cmd(nil, nil, fmt.Sprint(time.Now().UnixNano())))
case 7:
err = pool.Do(ctx, echo.FlatCmd(nil, nil, time.Now().UnixNano()))
case 8:
var res string
p := NewPipeline()
p.Append(Cmd(nil, "ECHO", fmt.Sprint(time.Now().UnixNano())))
p.Append(FlatCmd(&res, "ECHO", time.Now().UnixNano()))
err = pool.Do(ctx, p)
}
if err != nil {
b.Fatal(err)
}
}
})
}
Basically doing tons of different kinds of commands in parallel on the same pool. I didn't run into any panics or other errors. Which isn't to say the problem is definitely in your code, but I'm not able to reproduce the problem. Can you provide a code sample which reproduces the error?
@kollektiv I'm going to close this issue for now. If you're still seeing this issue I would very much like to hear about it so please re-open in that case. Thanks for trying out v4!