Heroic appears to mis-handle `EsRejectedExecutionException` errors from ElasticSearch in batched queries
Opened this issue · 2 comments
rf commented
I'm seeing an issue in production where a EsRejectedExecutionException
result from ElasticSearch will cause Heroic to return a correct-looking response but with empty results and empty errors. The responses look like this:
"redactedkey:1": {
"queryId": "21ee7a67-1023-495b-b2dd-55609d914f7a",
"range": {
"start": 1593316800000,
"end": 1595736600000
},
"trace": {
"what": {
"name": "com.spotify.heroic.CoreQueryManager#query"
},
"elapsed": 537117,
"children": [
{
"what": {
"name": "com.spotify.heroic.CoreQueryManager#query_shard[{site=us-east1}]"
},
"elapsed": 537157,
"children": [
{
"what": {
"name": "grpc://heroic-2.heroic.default.svc.cluster.local.:9698#query"
},
"elapsed": 537078,
"children": [
{
"what": {
"name": "com.spotify.heroic.metric.LocalMetricManager#query"
},
"elapsed": 327049,
"children": []
}
]
}
]
}
]
},
"limits": [],
"cached": false,
"cache": null,
"commonTags": {},
"commonResource": {},
"result": [],
"preAggregationSampleSize": 0,
"errors": []
},
And in our heroic logs:
Failed to execute phase [fetch],
21:47:18.350 [elasticsearch[_client_][listener][T#3]] ERROR com.spotify.heroic.rpc.grpc.GrpcRpcProtocolServer - 29c3cfe3-ca6c-4c16-bc16-b755c8867848: Request failed
... 48 more
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:94)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50)
Caused by: EsRejectedExecutionException[rejected execution of org.elasticsearch.action.search.FetchSearchPhase$1@4606797a on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@433876a3[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 3259109]]]
at java.lang.Thread.run(Thread.java:748)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
Heroic should pass this response along to the client in the errors
array.
lmuhlha commented
Thanks for reporting this! Just to confirm, are you running version 2.2.0 or an earlier one?
memory commented
We are running 0.10.5
at the moment.