riemann/riemann

Netty executor queue size is infinite resulting in GC pressure / OOM

nukemberg opened this issue · 4 comments

Describe the bug
In cases of overload with many clients or clients which mishandle backpressure netty executor queue can grow without bounds consuming massive amounts of memory. This leads to the server experiencing GC pressure further aggravating the problem and ultimately OOM.

To Reproduce
Run Riemann with many clients or clients that have a very large number of outstanding requests and slow down streams so that a backlog is created.

Expected behavior
Netty executor queue size should be limited and excess messages should be dropped, in which case a special "overload" event should be injected into the streams. Alternatively (perhaps configurable?) TCP backpressure should be applied as last resort.

sanel commented

Try setting property -Dio.netty.eventLoopThreads=N, where N > 1 at Riemann startup. See [1].

[1] https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/MultithreadEventLoopGroup.java#L41

sanel commented

I misread your message, so my comment above might not solve your issue. Try toying with io.netty.recycler.maxCapacity.default as well. netty has builtin recycler to reduce GC pressure, and the value should be > 256 (default is 262144).

Also, this will not solve your problems if you have badly designed streams that will indefinitely aggregate values. Small reproducible example and full stacktrace would be helpful.

I should have been clearer: the executor in question is the so called riemann netty event-executor which is where streams are handled and not Netty threads, defined here. The executor is a io.netty.util.concurrent.SingleThreadEventExecutorio.netty.util.concurrent.SingleThreadEventExecutor with default DEFAULT_MAX_PENDING_EXECUTOR_TASKS of int max value, see here - this can be changed by the io.netty.eventexecutor.maxPendingTasks system property. However I believe that this is a bad default for a system like Riemann and also should be set via the Riemann config and not through an obscure Netty system property unknown to most users.

So, it turns out that Riemann is using io.netty.util.concurrent.DefaultEventExecutorGroup which extends io.netty.util.concurrent.MultithreadEventExecutorGroup; every executor in the executor group has its own queue with size io.netty.eventexecutor.maxPendingTasks and Riemann uses (.. Runtime getRuntime availableProcessors) threads, so total queue size is `cpus * maxPendingTasks.

Note that the event executor group chooses an executor queue in a round robin fashion and each channel/socket is bound to an executor. This creates another performance problem as queueing in multiple queues is prone to higher latency and lower throughput (this is a known result in queueing theory - short explanation is that queue lengths have some variation so "evenly distributed" load will queue already overloaded queues). I assume this is done to preserve event ordering as events from the same client will be enqueued and handled in order, but this guarantee is not documented or promised anywhere in Riemann.