InfluxDB crash since update to 4.5
welcoMattic opened this issue · 5 comments
Problem/Motivation
InfluxDB has often crash on my Raspberry Pi 4 2Go RAM.
I decided to use the INFLUXDB_DATA_INDEX_VERSION=tsi1 env var to solve the problem. But InfluxDB is still crashing.
Expected behavior
Not to crash
Actual behavior
Crash
Steps to reproduce
Versions:
Add-on: InfluxDB
Scalable datastore for metrics, events, and real-time analytics
-----------------------------------------------------------
Add-on version: 4.5.0
You are running the latest version of this add-on.
System: Home Assistant OS 9.0 (armv7 / raspberrypi4)
Home Assistant Core: 2022.9.7
Home Assistant Supervisor: 2022.09.1
-----------------------------------------------------------
Please, share the above information when looking for help
or support in, e.g., GitHub, forums or the Discord chat.
Logs:
runtime: out of memory: cannot allocate 8192-byte block (544145408 in use)
fatal error: out of memory
goroutine 31 [running]:
runtime.throw(0xfebdde, 0xd)
/usr/local/go/src/runtime/panic.go:774 +0x5c fp=0x51c7114 sp=0x51c7100 pc=0x41644
runtime.(*mcache).refill(0xb6f73a34, 0x1f)
/usr/local/go/src/runtime/mcache.go:140 +0xfc fp=0x51c7128 sp=0x51c7114 pc=0x262ec
runtime.(*mcache).nextFree(0xb6f73a34, 0x1f93b21f, 0x1, 0x1b0, 0xd01ac0)
/usr/local/go/src/runtime/malloc.go:854 +0x7c fp=0x51c7148 sp=0x51c7128 pc=0x1b0f4
runtime.mallocgc(0xe0, 0x0, 0x0, 0xc9e74c)
/usr/local/go/src/runtime/malloc.go:1022 +0x7a0 fp=0x51c71b0 sp=0x51c7148 pc=0x1ba40
runtime.rawbyteslice(0xd1, 0x0, 0x0, 0x0)
/usr/local/go/src/runtime/string.go:272 +0x84 fp=0x51c71cc sp=0x51c71b0 pc=0x5f908
runtime.stringtoslicebyte(0x0, 0xbe57c7e0, 0xd1, 0xf0, 0x24bdfe38, 0x1)
/usr/local/go/src/runtime/string.go:161 +0xa4 fp=0x51c71ec sp=0x51c71cc pc=0x5f354
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).WriteMulti(0x49ee500, 0xbfba060, 0x51c7510, 0x1eb5b08)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:343 +0x268 fp=0x51c72d8 sp=0x51c71ec pc=0xc9e708
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*CacheLoader).Load.func1(0x51c7594, 0x4a12190, 0x51c758c, 0x49ee500, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:747 +0x4d4 fp=0x51c7568 sp=0x51c72d8 pc=0xd13240
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*CacheLoader).Load(0x4a12190, 0x49ee500, 0x1, 0x1)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:758 +0x88 fp=0x51c759c sp=0x51c7568 pc=0xca00bc
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).reloadCache(0x51da000, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2408 +0x1d4 fp=0x51c76ec sp=0x51c759c pc=0xccfbe4
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).Open(0x51da000, 0x4b44200, 0x1ed86d8)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:754 +0x28c fp=0x51c7734 sp=0x51c76ec pc=0xcc69b4
github.com/influxdata/influxdb/tsdb.(*Shard).Open.func1(0x51b2000, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/tsdb/shard.go:344 +0x298 fp=0x51c7a58 sp=0x51c7734 pc=0x5d12c4
github.com/influxdata/influxdb/tsdb.(*Shard).Open(0x51b2000, 0x4b44240, 0x494f2f0)
/go/src/github.com/influxdata/influxdb/tsdb/shard.go:355 +0x1c fp=0x51c7a84 sp=0x51c7a58 pc=0x5b9dac
github.com/influxdata/influxdb/tsdb.(*Store).loadShards.func1(0x4ae66c0, 0x49c2140, 0x4f99b00, 0x4ae6700, 0xfd51f0, 0x4be8420, 0x4f1e400, 0x4a80ed4, 0x9, 0x494e6de, ...)
/go/src/github.com/influxdata/influxdb/tsdb/store.go:404 +0x4c4 fp=0x51c7fb4 sp=0x51c7a84 pc=0x5d23e4
runtime.goexit()
/usr/local/go/src/runtime/asm_arm.s:868 +0x4 fp=0x51c7fb4 sp=0x51c7fb4 pc=0x73610
created by github.com/influxdata/influxdb/tsdb.(*Store).loadShards
/go/src/github.com/influxdata/influxdb/tsdb/store.go:362 +0xb64
goroutine 1 [chan receive, 1 minutes]:
github.com/influxdata/influxdb/tsdb.(*Store).loadShards(0x49c2140, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/tsdb/store.go:421 +0x115c
github.com/influxdata/influxdb/tsdb.(*Store).Open(0x49c2140, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/tsdb/store.go:221 +0x1a4
github.com/influxdata/influxdb/cmd/influxd/run.(*Server).Open(0x49be320, 0x4ab3cac, 0x49be320)
/go/src/github.com/influxdata/influxdb/cmd/influxd/run/server.go:444 +0x894
github.com/influxdata/influxdb/cmd/influxd/run.(*Command).Run(0x49b7800, 0x48920f0, 0x0, 0x0, 0x0, 0x48920f0)
/go/src/github.com/influxdata/influxdb/cmd/influxd/run/command.go:149 +0x7e4
main.(*Main).Run(0x4ab3f8c, 0x48920f0, 0x0, 0x0, 0x2b7a8f8, 0x4890030)
/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:81 +0x104
main.main()
/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:45 +0x140
goroutine 34 [syscall, 1 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:147 +0x130
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x14
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x30
goroutine 4 [select]:
github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view.(*worker).start(0x4adc680)
/go/src/github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view/worker.go:154 +0xb0
created by github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view.init.0
/go/src/github.com/influxdata/influxdb/vendor/go.opencensus.io/stats/view/worker.go:32 +0x48
goroutine 5 [IO wait, 1 minutes]:
internal/poll.runtime_pollWait(0xa6bc7f80, 0x72, 0x0)
/usr/local/go/src/runtime/netpoll.go:184 +0x44
internal/poll.(*pollDesc).wait(0x49bcab4, 0x72, 0x0, 0x0, 0xfe25d8)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x30
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0x49bcaa0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:384 +0x1a8
net.(*netFD).accept(0x49bcaa0, 0x0, 0xa8001, 0x0)
/usr/local/go/src/net/fd_unix.go:238 +0x20
net.(*TCPListener).accept(0x4b31230, 0x4b3eb80, 0x40000000, 0x0)
/usr/local/go/src/net/tcpsock_posix.go:139 +0x20
net.(*TCPListener).Accept(0x4b31230, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/net/tcpsock.go:261 +0x3c
github.com/influxdata/influxdb/tcp.(*Mux).Serve(0x4b3eb80, 0x1ec3ff0, 0x4b31230, 0x4b31230, 0x0)
/go/src/github.com/influxdata/influxdb/tcp/mux.go:75 +0x64
created by github.com/influxdata/influxdb/cmd/influxd/run.(*Server).Open
/go/src/github.com/influxdata/influxdb/cmd/influxd/run/server.go:395 +0x1f0
[16:22:26] WARNING: InfluxDB crashed, halting add-on
s6-rc: info: service legacy-services: stopping
[16:22:26] INFO: InfluxDB stopped, restarting...
[16:22:26] INFO: NGINX stopped, restarting...
s6-svwait: fatal: supervisor died
[16:22:26] INFO: Chronograf stopped, restarting...
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
[16:22:26] INFO: Kapacitor stopped, restarting...
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
I'm seeing the same error with 4.3 and 4.5.
runtime: out of memory: cannot allocate 8192-byte block (564363264 in use)
fatal error: out of memory
goroutine 2017 [running]:
@welcoMattic - did you find a solution?
I suspect it is something size related. The memory in use, assuming 564363264 is in bytes not blocks, is only 0.525GB.
I've increased swap to 6GB, so I have over 5GB of swap available but it still fails allocating at that point.
Is the add-on limiting the amount of memory available to the container?
This post: Fatal error: out of memory from the Influx community seems relevant and maybe where you got the idea to change the index type from inmem
to tsi1
@welcoMattic - were you able to confirm that setting the env variable for the add-on actually changed the configuration inside of the InfluxDB container?
I had to completely reinstall InfluxDB, with this env var. After reimport of my data, it's ok, no more crash.
I had to completely reinstall InfluxDB, with this env var. After reimport of my data, it's ok, no more crash.
How were you able to export your data if InfluxDB was crashing?
I had daily backups of my HA instance. I grab influxdb data from the last one
There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!