invalid memory address or nil pointer dereference in query.go:99
MaXal opened this issue · 9 comments
Describe the bug
Under heavy load I get the following error:
golang.org/x/sync@v0.2.0/errgroup/errgroup.go:72 +0xa5
created by golang.org/x/sync/errgroup.(*Group).Go
golang.org/x/sync@v0.2.0/errgroup/errgroup.go:75 +0x64
golang.org/x/sync/errgroup.(*Group).Go.func1()
github.com/ClickHouse/ch-go@v0.56.0/query.go:622 +0x85
github.com/ClickHouse/ch-go.(*Client).Do.func3()
github.com/ClickHouse/ch-go@v0.56.0/query.go:99 +0x635
github.com/ClickHouse/ch-go.(*Client).sendQuery(_, {_, _}, {{0xc0011722c0, 0x1a6}, {0xc002823dd0, 0x24}, {0x0, 0x0}, {0x0, ...}, ...})
goroutine 26759 [running]:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x86ec35]
panic: runtime error: invalid memory address or nil pointer dereference
Steps to reproduce
Unfortunately, the issue is reproducible only in production and I can't reproduce it locally.
Environment
- Client version: 0.56.0
- Language version: 1.20.4
- OS: Linux
ClickHouse server
- ClickHouse Server version: 22.10.3.27
I've tried to rewrite the code in sendQuery
to be as safe as possible on connection and local address: https://github.com/MaXal/ch-go/blob/165043dbdfd3b483c72d212849fe3006976550fd/query.go#L78
but the error still occurs on:
golang.org/x/sync@v0.2.0/errgroup/errgroup.go:72 +0xa5
created by golang.org/x/sync/errgroup.(*Group).Go
golang.org/x/sync@v0.2.0/errgroup/errgroup.go:75 +0x64
golang.org/x/sync/errgroup.(*Group).Go.func1()
github.com/ClickHouse/ch-go@v0.56.0/query.go:629 +0x85
github.com/ClickHouse/ch-go.(*Client).Do.func3()
github.com/ClickHouse/ch-go@v0.56.0/query.go:88 +0xa18
github.com/ClickHouse/ch-go.(*Client).sendQuery(_, {_, _}, {{0xc001351500, 0x4b9}, {0xc000d39050, 0x24}, {0x0, 0x0}, {0x0, ...}, ...})
github.com/ClickHouse/ch-go@v0.56.0/client.go:289
github.com/ClickHouse/ch-go.(*Client).encode(...)
github.com/ClickHouse/ch-go@v0.56.0/proto/query.go:185 +0x3b
github.com/ClickHouse/ch-go/proto.Query.EncodeAware({{0xc000d39050, 0x24}, {0xc001351500, 0x4b9}, {0x0, 0x0}, 0x2, 0x0, {0xd4bc, 0x0, ...}, ...}, ...)
goroutine 189016 [running]:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5db57b]
panic: runtime error: invalid memory address or nil pointer dereference
It looks like I figured out the reason, the issue happens due to #236. The query closes the client and makes the connection and buffers to be nil
which happens after the check for c.conn == nil
.
After I reverted the commit, there are no more panics.
Are you making concurrent Do() calls?
On each request, we acquire a client from the pool and perform Do() call, after each Do, we release the client back to the pool. So there are concurrent Do() calls from different clients but it looks like there are no concurrent Do() calls for the same client.
What we do have is quite a lot of:
"msg":"cannot handle http request","error":"packet: uvarint: read: read: EOF"
that I assume happens because Clickhouse is overloaded so we have a lot of queries cancelations.
The code is available: https://github.com/JetBrains/ij-perf-report-aggregator/blob/35266481c781d75e96c4005a9211a65666d67a80/pkg/data-query/db.go#L48
Thank you a lot for looking into it!
Looks like connection was closed while reading responce from server.
I'll take a look into this case.
Please try v0.56.1
🙏 I'm genuinely overcome with gratitude for the fix. It not only addresses the NPE, but it completely resolves all our existing connection issues, a feat that leaves me in awe. I've been successful in executing queries that had, until now, been perpetually failing, returning perplexing messages such as:
handle packet: UNEXPECTED_PACKET_FROM_CLIENT (101): DB::NetException: Unexpected packet Hello received from client
packet: uvarint: read: read: EOF
We've spent years grappling with these challenges, considering whether the root cause lies with Clickhouse or our resources. I am deeply thankful for your expertise and hard work! Thank you for providing such a stellar solution.
Great, I’m so happy that it worked so well!
Thank you 💜
So I’m closing this issue for now.
Thanks for fixing this. I have the same problem and I tried to fix it by commenting the defer function in the Close. Now I can use the official client instead of my own fork. :)