kuskoman/logstash-exporter

Logstash Exporter keeps crashing

luisfelipegarcia opened this issue · 13 comments

Description of the Issue

Running v1.6.3 (ARM)
Logstash 8.12.0
Ubuntu 22.04

I have noticed that logstash exporter keeps crashing. The logs show a "fatal error: schedule: holding locks" message.
Restarting the service brings the application back up, but it eventually runs into the same issue.

Version of logstash-exporter, or logstash-exporter Image

v1.6.3

Version of Chart (if applicable)

No response

Operating System/Environment

Ubuntu 22.04

Logs

Mar 05 23:55:55 ip-172-31-0-158 logstash-exporter[1360]: time=2024-03-05T23:55:55.645Z level=INFO msg="starting server on" host="" port=9>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: fatal error: schedule: holding locks
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: panic during panic
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime stack:
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.throw({0x49a888, 0x17})
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/panic.go:1023 +0x4c fp=0x>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.schedule()
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/proc.go:3843 +0x2fc fp=0x>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.goschedImpl(0x1484c68, 0x1)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/proc.go:4065 +0x198 fp=0x>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.gopreempt_m(...)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/proc.go:4082
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.newstack()
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/stack.go:1070 +0x3b0 fp=0>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.morestack()
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/asm_arm.s:383 +0x60 fp=0x>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: goroutine 1 gp=0x1402128 m=nil [IO wait, 59 minutes]:
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.gopark(0x4b7118, 0xf7870f08, 0x2, 0x2, 0x5)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/proc.go:402 +0x104 fp=0x1>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: runtime.netpollblock(0xf7870ef8, 0x72, 0x0)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/netpoll.go:573 +0x100 fp=>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: internal/poll.runtime_pollWait(0xf7870ef8, 0x72)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/runtime/netpoll.go:345 +0x54 fp=0>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: internal/poll.(*pollDesc).wait(0x14181a8, 0x72, 0x0)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/internal/poll/fd_poll_runtime.go:>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: internal/poll.(*pollDesc).waitRead(...)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/internal/poll/fd_poll_runtime.go:>
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]: internal/poll.(*FD).Accept(0x1418190)
Mar 06 00:56:16 ip-172-31-0-158 logstash-exporter[1360]:         /opt/hostedtoolcache/go/1.22.0/x64/src/internal/poll/fd_unix.go:611 +0x2>

hello @luisfelipegarcia
are you able to provide any kind of example replicating this behaviour?
since the error is concurrency-related it may be very hard to debug

@kuskoman , I don't have any example to replicate it. It just dies for some reason. This is on a low volume instance of logstash at that. Are there any other logs I can provide to help pinpoint the issue?

@luisfelipegarcia i wonder if this error is more likely to happen in arm linux build. i will try to test that

for now, since it does not seem like an easy fix it may be open for a while.
could you please check if the error is the same when using the latest prerelease?

I can do that. I'll test and post an update here.

@kuskoman , I tried running the latest prerelease but am running into the same issue as #300

time=2024-03-13T19:14:28.458Z level=ERROR msg="executor failed" name=nodestats duration=55.712598ms err="json: cannot unmarshal number 17179869184 into Go struct field .jvm.mem.heap_committed_in_bytes of type int" time=2024-03-13T19:15:28.456Z level=ERROR msg="executor failed" name=nodestats duration=54.499492ms err="json: cannot unmarshal number 17179869184 into Go struct field .jvm.mem.heap_committed_in_bytes of type int"

@luisfelipegarcia seems like i missed it in v2. i will fix v2 and tell you to check it again later on

@kuskoman , any update on this?

Thanks for the reminder, I created a PR to fix the datatypes
I decided to drop handling uint values, because some values may hold -1 and I don't want to check each and every one

@luisfelipegarcia check v2.0.0-pre6

I just tested v2.0.0-pre6. Unfortunately I am still getting the "cannot unmarshal number..." errors

time=2024-03-19T18:34:47.192Z level=ERROR msg="executor failed" name=nodestats duration=198.623779ms err="json: cannot unmarshal number 2923200586 into Go struct field .pipelines.events.out of type int" time=2024-03-19T18:35:47.299Z level=ERROR msg="executor failed" name=nodestats duration=306.281444ms err="json: cannot unmarshal number 2924965461 into Go struct field .pipelines.events.out of type int" time=2024-03-19T18:36:47.150Z level=ERROR msg="executor failed" name=nodestats duration=157.919356ms err="json: cannot unmarshal number 2926774586 into Go struct field .pipelines.events.out of type int" time=2024-03-19T18:37:47.169Z level=ERROR msg="executor failed" name=nodestats duration=176.531401ms err="json: cannot unmarshal number 2928573836 into Go struct field .pipelines.events.out of type int"

@luisfelipegarcia well, i did not expect this particular stat to be that big
could you check v2.0.0-pre7?

@kuskoman , same issue: "cannot unmarshal...", with pre7

...# ./logstash-exporter-linux-arm --config config.yml time=2024-03-21T15:31:41.329Z level=WARN msg="failed to load .env file" error="open .env: no such file or directory" time=2024-03-21T15:31:41.330Z level=INFO msg="Version: unknown, SemanticVersion: unknown, GitCommit: unknown, GoVersion: go1.22.1, BuildArch: arm, BuildOS: linux, BuildDate: unknown" time=2024-03-21T15:31:41.330Z level=INFO msg="starting server on" host=0.0.0.0 port=9198 time=2024-03-21T15:31:47.071Z level=ERROR msg="executor failed" name=nodestats duration=76.210261ms err="json: cannot unmarshal number 8035345869 into Go struct field .pipelines.plugins.inputs.events.out of type int" time=2024-03-21T15:32:47.072Z level=ERROR msg="executor failed" name=nodestats duration=78.437605ms err="json: cannot unmarshal number 8037152956 into Go struct field .pipelines.plugins.inputs.events.out of type int" time=2024-03-21T15:33:47.134Z level=ERROR msg="executor failed" name=nodestats duration=140.536317ms err="json: cannot unmarshal number 8039006157 into Go struct field .pipelines.plugins.inputs.events.out of type int"

@luisfelipegarcia would you be able to provide censored dump from nodestats endpoint from logstash, so I can look which metrics are potentially grow fast enough to overflow 32 bit integer?