performancecopilot/parfait

parfait-agent initializes before main app so JMX MBeans do not exist

rwallinterset opened this issue · 7 comments

I have been trying to figure out how to use parfait and I thought I had it all figured out but I can't get my custom jmx mbeans to be registered. I am currently running parfait using the javaagent argument when calling my app. After lots of messing around I think the issue is that parfait is being loaded before my app initializes so the JMX mbeans do not exist at the time time parfait is initialized. What am I doing wrong?

I also tried to figure out how the JMXconnector works as I thought that might be a better way to go but I can't figure out how to configure the proxy.xml to point it at a specific java app.

Any help would be appreciated.

Ok so I confirmed that was the issue. I created a new thread class and if I spin a thread with a sleep in it which then executes the ParfaitAgent premain method all works fine.

@natoscott will have to comment directly here on the current state, but I believe right now parfait-agent will only export some known JVM metrics, there's not yet a mechanism to export custom (application-specific) metrics as yet (though that is definitely the plan).

Part of the Parfait lifecycle requires a 'quiet' period for new metrics to appear - it waits up to 5 seconds after the last detected metric has been registered before (re)starting the export. This prevents a large application starting up, registering a lot from tripping a blitz of registrations to PCP. Once the application has quieted down, Parfait will begin exporting to PCP.

Eventually your own JMX/custom metrics will be registered by a mechanism like an XML config.

You might also want to look at the Parfait-DropWizard integration (part of this module). If you are using DropWizard metrics in the application, it's pretty easy to have these exported to PCP.

@tallpsmith that's correct - addition of user-defined metrics, as described, is WIP (as is exposing the Parfait delayed initialization concept). With the earlier reference to proxy.xml, that's just a placeholder at this stage - also WIP.

Just a bit more followup here - parfait-agent has always used the DynamicMonitoringView Parfait class, which performs the delayed-initialisation-if-needed @tallpsmith referred to above (re 'quiet' period). I've just added some code to allow the application startup time to be set to something different to the default 5 seconds if need be. I'll be looking into the proxying mode shortly too.

The above startup delay time solves the issue of application MBeans not being available when Parfait is starting or if the application is somewhat deterministically slow to create them when starting up.

The problem with a simple delay based approach is that applications may do lazy or on-demand initialization, i.e., MBeans are created later during the application lifecycle (for example, due to an external configuration management event or a dynamically attached tracing tool making new MBeans available, both approaches are used to avoid the need for restarting applications in production). In cases like this all metrics should be collected if available and without Parfait forcing to restart the application if the application itself (or possible troubleshooting tools) support MBean (metrics) creation on demand basis.

So, ideally, there would be a configuration option to ask Parfait to continue with those metrics available in the beginning and then periodically or via an event to check whether the other configured metrics have appeared.

This isn't just a simple Delay, it's a period of which any new metric added, at any time will wait up to 5 seconds before the DynamicMonitoringView re-configures & re-writes out the new MMV values.

What it seems might be needed (without looking at the current state) is an MBeanListener that is detecting MBeans being added/removed and wrapping with PCP metrics (if matched).

I don't think it's the MonitoringView that's the issue here, but the MBean detections.

If you think delay is wrong term here, please fix parfait(1) man page accordingly which uses delay as well.