pambrose/prometheus-proxy

Prometheus-proxy report java heap space

Closed this issue · 12 comments

You are going to have to give me a little context for this.

ok, I use prometheus-agent to receive data from node_exporter, and then prometheus-agent connects to prometheus-proxy. After prometheus-proxy runs for two days, it reports the error of the above screenshot information, and then prometheus-proxy stops working directly. Now it reports the following after startup Content, the amount of data that my prometheus-proxy transmits will be huge, is it related to this?

09:21:01.717 WARN [DefaultPromise.java:581] - An exception was thrown by io.grpc.netty.NettyServerTransport$1TerminationNotifier.operationComplete() [grpc-nio-worker-ELG-3-1]
java.lang.NullPointerException: Parameter specified as non-null is null: method io.prometheus.proxy.ProxyServerTransportFilter.transportTerminated, parameter attributes
at io.prometheus.proxy.ProxyServerTransportFilter.transportTerminated(ProxyServerTransportFilter.kt)
at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.transportTerminated(ServerImpl.java:454)
at io.grpc.netty.NettyServerTransport.notifyTerminated(NettyServerTransport.java:207)
at io.grpc.netty.NettyServerTransport.access$100(NettyServerTransport.java:51)
at io.grpc.netty.NettyServerTransport$1TerminationNotifier.operationComplete(NettyServerTransport.java:141)
at io.grpc.netty.NettyServerTransport$1TerminationNotifier.operationComplete(NettyServerTransport.java:134)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1182)
at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:773)
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:749)
at io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:968)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:951)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:913)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
at io.grpc.netty.AbstractNettyHandler.sendInitialConnectionWindow(AbstractNettyHandler.java:114)
at io.grpc.netty.AbstractNettyHandler.handlerAdded(AbstractNettyHandler.java:78)
at io.grpc.netty.NettyServerHandler.handlerAdded(NettyServerHandler.java:378)
at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:938)
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
at io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:572)
at io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:515)
at io.grpc.netty.ProtocolNegotiators$GrpcNegotiationHandler.userEventTriggered(ProtocolNegotiators.java:919)
at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
at io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.fireProtocolNegotiationEvent(ProtocolNegotiators.java:1090)
at io.grpc.netty.ProtocolNegotiators$WaitUntilActiveHandler.protocolNegotiationEventTriggered(ProtocolNegotiators.java:1005)
at io.grpc.netty.ProtocolNegotiators$ProtocolNegotiationHandler.userEventTriggered(ProtocolNegotiators.java:1061)
at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:324)
at io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1428)
at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:346)
at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:332)
at io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:913)
at io.grpc.netty.WriteBufferingAndExceptionHandler.handlerAdded(WriteBufferingAndExceptionHandler.java:62)
at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:938)
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:223)
at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:381)
at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:370)
at io.grpc.netty.NettyServerTransport.start(NettyServerTransport.java:153)
at io.grpc.netty.NettyServer$1.initChannel(NettyServer.java:290)
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:938)
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)

Help me understand your usage. You say that it works for two days and then it fails. When you say: "the amount of data that my prometheus-proxy transmits will be huge," are you talking about the cumulative amount of data or intermittent large amounts of data?

When you describe the data as "huge," can you give me some idea of the magnitude?

Have you tried the --chunk or --zip options on the agent?

Yes! It works for two days and then it fails,It is the cumulative amount of data, When I look at my exporter, I see that there are thousands of monitoring items ,The exception is prometheus-proxy, not prometheus-agent, Will adding chunk or zip work on promtheus-proxy?

Hundreds of people use it for days on end and I have not anyone report anything like this, so it is a bit perplexing. It is also strange that the stacktrace entries pertaining to ProxyServerTransportFilter are missing any line numbers.

What version of Java are you using. Also, what version of prometheus-proxy are you using?

Go ahead and try the chunk options. Also, the first thing you sent me showed that you were running out of heap. Can you try running with more heap?

If those do not make a difference, I can cut a new release with the latest version of gRPC, which was just updated, and have you try that. The stacktrace suggests something in gRPC is having a problem.

java version:
/app $ java --version
openjdk 11.0.13 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-alpine-r0)
OpenJDK 64-Bit Server VM (build 11.0.13+8-alpine-r0, mixed mode)

promtheus-proxy version:
pambrose/prometheus-proxy 1.13.0

I will use larger heap space for testing。
It would be great to be able to test with the latest version, thanks。

Okay, I am in the process of posting a new release.

I posted 1.14.0. The Docker image was updated from jre 11 to 17, and the proxy/agent were updated to kotlin 17.10 and gRPC 1.49.0. Give it a try and let's see if those make a difference.

OK, thanks

Success?

yes, It seccess. Thanks

Excellent.