ClickHouse/ch-go

invalid memory address or nil pointer dereference in query.go:99

MaXal opened this issue · 9 comments

MaXal commented

Describe the bug

Under heavy load I get the following error:

golang.org/x/sync@v0.2.0/errgroup/errgroup.go:72 +0xa5
created by golang.org/x/sync/errgroup.(*Group).Go
golang.org/x/sync@v0.2.0/errgroup/errgroup.go:75 +0x64
golang.org/x/sync/errgroup.(*Group).Go.func1()
github.com/ClickHouse/ch-go@v0.56.0/query.go:622 +0x85
github.com/ClickHouse/ch-go.(*Client).Do.func3()
github.com/ClickHouse/ch-go@v0.56.0/query.go:99 +0x635
github.com/ClickHouse/ch-go.(*Client).sendQuery(_, {_, _}, {{0xc0011722c0, 0x1a6}, {0xc002823dd0, 0x24}, {0x0, 0x0}, {0x0, ...}, ...})
goroutine 26759 [running]:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x86ec35]
panic: runtime error: invalid memory address or nil pointer dereference

Steps to reproduce

Unfortunately, the issue is reproducible only in production and I can't reproduce it locally.

Environment

  • Client version: 0.56.0
  • Language version: 1.20.4
  • OS: Linux

ClickHouse server

  • ClickHouse Server version: 22.10.3.27
MaXal commented

I've tried to rewrite the code in sendQuery to be as safe as possible on connection and local address: https://github.com/MaXal/ch-go/blob/165043dbdfd3b483c72d212849fe3006976550fd/query.go#L78
but the error still occurs on:

	golang.org/x/sync@v0.2.0/errgroup/errgroup.go:72 +0xa5
created by golang.org/x/sync/errgroup.(*Group).Go
	golang.org/x/sync@v0.2.0/errgroup/errgroup.go:75 +0x64
golang.org/x/sync/errgroup.(*Group).Go.func1()
	github.com/ClickHouse/ch-go@v0.56.0/query.go:629 +0x85
github.com/ClickHouse/ch-go.(*Client).Do.func3()
	github.com/ClickHouse/ch-go@v0.56.0/query.go:88 +0xa18
github.com/ClickHouse/ch-go.(*Client).sendQuery(_, {_, _}, {{0xc001351500, 0x4b9}, {0xc000d39050, 0x24}, {0x0, 0x0}, {0x0, ...}, ...})
	github.com/ClickHouse/ch-go@v0.56.0/client.go:289
github.com/ClickHouse/ch-go.(*Client).encode(...)
	github.com/ClickHouse/ch-go@v0.56.0/proto/query.go:185 +0x3b
github.com/ClickHouse/ch-go/proto.Query.EncodeAware({{0xc000d39050, 0x24}, {0xc001351500, 0x4b9}, {0x0, 0x0}, 0x2, 0x0, {0xd4bc, 0x0, ...}, ...}, ...)
goroutine 189016 [running]:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5db57b]
panic: runtime error: invalid memory address or nil pointer dereference
MaXal commented

It looks like I figured out the reason, the issue happens due to #236. The query closes the client and makes the connection and buffers to be nil which happens after the check for c.conn == nil.
After I reverted the commit, there are no more panics.

ernado commented

Are you making concurrent Do() calls?

MaXal commented

On each request, we acquire a client from the pool and perform Do() call, after each Do, we release the client back to the pool. So there are concurrent Do() calls from different clients but it looks like there are no concurrent Do() calls for the same client.

What we do have is quite a lot of:

"msg":"cannot handle http request","error":"packet: uvarint: read: read: EOF"

that I assume happens because Clickhouse is overloaded so we have a lot of queries cancelations.

The code is available: https://github.com/JetBrains/ij-perf-report-aggregator/blob/35266481c781d75e96c4005a9211a65666d67a80/pkg/data-query/db.go#L48

Thank you a lot for looking into it!

ernado commented

Looks like connection was closed while reading responce from server.
I'll take a look into this case.

ernado commented

Please try v0.56.1

MaXal commented

🙏 I'm genuinely overcome with gratitude for the fix. It not only addresses the NPE, but it completely resolves all our existing connection issues, a feat that leaves me in awe. I've been successful in executing queries that had, until now, been perpetually failing, returning perplexing messages such as:

handle packet: UNEXPECTED_PACKET_FROM_CLIENT (101): DB::NetException: Unexpected packet Hello received from client

packet: uvarint: read: read: EOF

We've spent years grappling with these challenges, considering whether the root cause lies with Clickhouse or our resources. I am deeply thankful for your expertise and hard work! Thank you for providing such a stellar solution.

ernado commented

Great, I’m so happy that it worked so well!

Thank you 💜

So I’m closing this issue for now.

Thanks for fixing this. I have the same problem and I tried to fix it by commenting the defer function in the Close. Now I can use the official client instead of my own fork. :)