prometheus/jmx_exporter

Have some example rules of generic JVM metrics?

EdSchouten opened this issue ยท 39 comments

Hi there,

We're making use of the JMX exporter in combination with Tomcat and Cassandra. While using it, we've noticed that none of the examples contain any generic rules for extracting properties out of the JVM, like memory usage. Right now we're planning on adding something like this to our configs:

- pattern: 'java.lang<type=Memory><(\w+)MemoryUsage>(\w+):'
  name: java_memory_usage_$2_bytes
  labels:
    area: "$1"
  help: Java $2 $1 memory usage
  type: GAUGE

Resulting in metrics like these:

java_memory_usage_committed_bytes{area="Heap",} 1.489731584E9
java_memory_usage_committed_bytes{area="NonHeap",} 3.9223296E8
java_memory_usage_max_bytes{area="Heap",} 4.294770688E9
java_memory_usage_max_bytes{area="NonHeap",} 4.17333248E8
java_memory_usage_init_bytes{area="Heap",} 2.73888048E8
java_memory_usage_init_bytes{area="NonHeap",} 3.69557504E8
java_memory_usage_used_bytes{area="Heap",} 9.61119192E8
java_memory_usage_used_bytes{area="NonHeap",} 2.09771528E8

Question: would it make sense to have some kind of examples file containing rules like these?

These shouldn't be in the examples, as the standard exports of the agent already provide these metrics in a better format.

Oh, wow. Running it separately or as an agent determines which metrics are exported by default? Interesting. Is this documented somewhere? What kind of metrics are exported by default when running as an agent?

Those metrics should always be exported, but they're not enabled for the http server as they'd be confusing.

Should we document this somewhere?

Oh. I just ran into this. How can I induce the http server to do things like jvm_memory_bytes_used?

Run it as a java agent.

hejix commented

Hi Brian, There are many good reasons to run the jmx_exporter separately as a server- the main one being able to change the metrics captured without having to restart the underlying process. Would appreciate either being able to turn on the standard jvm metrics or knowing the rules to enable them as mentioned by earlier posters.

the main one being able to change the metrics captured without having to restart the underlying process.

That doesn't require a restart.

How do I drop values scraped by default i.e. not controlled by the exporter config?
Stuff like:
jmx_config_reload_success_total 0.0
jvm_memory_pool_bytes_committed{pool="Code Cache",}
Do I need to drop those in the relabeling phase?
I am running it as an agent.

Why do you want to drop them? They're all fairly important.

I want to spend my bytes on other things.

A couple of items that would dramatically reduce the engineering cost of prometheus for us

  1. can you elaborate on

That doesn't require a restart.

How do I get it to reload the conf without restarting the process?

  1. I can't get the logging config to work so I get zero log output and haven't any idea why it fails. In most cases the apps I'm scraping are using several of log4j and logback and java.util.logging.

I want to spend my bytes on other things.

If you're so short on resource that this matters (which seems unlikely), you can drop them on the Prometheus end.

How do I get it to reload the conf without restarting the process?

Change the config file on disk.

In most cases the apps I'm scraping are using several of log4j and logback and java.util.logging.

If it's stats on those you want look at the simpleclient modules for those. Neither expose jmx metrics as far as I'm aware.

This has gotten way off-topic, so closing. Please ask usage questions on https://groups.google.com/forum/#!forum/prometheus-users

I would like to re-open this issue @brian-brazil. All the process metrics for Kafka are available through JMX. Since we're running the httpserver against the kafka JMX metrics, it makes sense to have an example for the JVM metrics since they're already available.

Also, I think your logic here is extremely confusing

they're not enabled for the http server as they'd be confusing.

Having inconsistent and undocumented defaults is even more confusing.

All the process metrics for Kafka are available through JMX.

Not all of them.

Use of the agent is recommended over the httpserver, and I do not wish to have to maintain two versions of the same bit of code/configuration.

Thank you for your response.
Is it possible to run the agent with Kafka? Do I have to restart kafka to attach the agent?

Yes, you need to restart kafka to attach it. https://www.robustperception.io/monitoring-kafka-with-prometheus/ has instructions.

Thank you, last question: are there plans to deprecate the httpserver?

No, there's no plans to do so.

Those metrics should always be exported, but they're not enabled for the http server as they'd be confusing.

@brian-brazil can you add some sort of tip to the readme that jvm_* metrics are only exposed when using the Java agent? It took me an hour or two of troubleshooting and searching old issues to figure this out, after playing only with the HTTP server version. Thanks!

We recommend always using the java agent, if you go against that and use the http server this is one of the issues you'll run into.

We recommend always using the java agent, if you go against that and use the http server this is one of the issues you'll run into.

@brian-brazil yes I understand that now, but only after an hour or two of troubleshooting and reading the GitHub issues. My comment above was a suggestion that you should make it more obvious in the readme.

The readme isn't very helpful for new users, in my opinion. For example: I'm a sysadmin, not a Java developer. I don't know what JMX or a Java agent is, I'm just trying to monitor some application server! It took me a few hours to figure out how to get this thing running first of all, and then it's not obvious that the JVM metrics aren't exposed via the http server.

The first section is https://github.com/prometheus/jmx_exporter#running which gives you a full example command line, I'm not sure it's possible to make it any simpler.

Brian,

I don't think the issue necessarily is that people don't know how to run this as an agent. It's more that the README doesn't explain the difference between running it as an agent or as a separate process. This is why this issue still gets some traffic once every couple of weeks/months of people who run into the same problems as I did.

The very top of the README currently has these two sentences right after each other:

It meant to be run as a Java Agent, exposing an HTTP server and scraping the local JVM.
This can be also run as an independent HTTP server and scrape remote JMX targets.

Would it make sense to extend this?

I've tried to keep the readme light on the http server, so that users would follow the easy path and use the agent as that's what's most clearly documented and works best. At the end of the day users often over-complicate things for themselves, and I can't stop all of them.

Do you have wording you'd suggest?

@brian-brazil Any thoughts on #204?

@brian-brazil Hi ... reading this issue I don't see any real reason why the HTTP server mode doesn't support JVM metrics and what do you mean saying "they'd be confusing". I'm thinking about using it within a "sidecar" approach on Kubernetes/OpenShift but it seems that you really discourage the HTTP server mode. Can you explain better why ? It's something related to maintaining the source code because it will be deprecated in a near future or there are some technical reasons ?

@brian-brazil any updated on the above question ?

@brian-brazil Hi Brian, might be too naive to ask, but I'm using the JMX Exporter as an agent as described in the robutsperception link, and I want to customize the name of the metric generated for jvm.* or process.* related metrics.. Reason being, the same jvm/process related metrics are generated by multiple targets and is fed into prometheus, which doesn't allow me to choose the metric from kafka node specifically. Couldn't figure out a way. Would be really helpful, if you please guide me in the right direction? Thanks

Oh wow, doing design choice because believing that other developers and users don't understand things about JVM?

There is no performance nor implementation limitations of why JVM metrics cannot be available over HTTP server, while there are number of reasons to not run this as a java agent. It is just developers of this tool own decision to not do it because they want so. Amazing.

@EdSchouten I fixed one of your patterns:

- pattern: 'java.lang<type=Memory><(\w+)MemoryUsage>(\w+): (\d+)'
  name: java_memory_usage_$2_bytes
  labels:
    area: "$1"
  value: $3
  # help: Java $2 $1 memory usage
  type: GAUGE

This returns:

# HELP java_memory_usage_used_bytes java.lang.management.MemoryUsage (java.lang<type=Memory><HeapMemoryUsage>used)
# TYPE java_memory_usage_used_bytes gauge
java_memory_usage_used_bytes{area="Heap",} 5.47159024E8
java_memory_usage_used_bytes{area="NonHeap",} 7.235664E7

You can change the help line as you want.

This is the final set of patterns I'm using:

- pattern: 'java.lang<type=Memory><(\w+)MemoryUsage>(\w+): (\d+)'
  name: jvm_memory_usage_$2_bytes
  labels:
    area: "$1"  # Heap/NonHeap
  value: $3
  type: GAUGE

# name is always the same, the name of the GC
- pattern: 'java.lang<type=GarbageCollector, name=[^,]+, key=([^>]+)><LastGcInfo, memoryUsageAfterGc>(used|commited): (\d+)'
  name: jvm_memory_after_gc_$2_bytes
  value: $3
  labels:
    space: $1
  type: GAUGE

- pattern: 'java.lang<type=GarbageCollector, name=[^>]+><LastGcInfo>duration: (\d+)'
  name: jvm_gc_duration_seconds
  value: $1
  type: GAUGE
  # Convert microseconds to seconds
  valueFactor: 0.000001

# java.lang<type=GarbageCollector, name=G1 Young Generation><>CollectionCount
- pattern: 'java.lang<type=GarbageCollector, name=([^>]+)><>CollectionCount: (\d+)'
  name: jvm_gc_collection_count
  value: $2
  labels:
    name: $1
  type: GAUGE

@mdione-cloudian we had to use a slightly modified version of your patterns to get java metrics out of kafka. /cc @sraghav1

    - pattern: 'java.lang<type=Memory><(\w+)MemoryUsage>(\w+): (\d+)'
      name: jvm_memory_usage_$2_bytes
      labels:
        area: "$1"  # Heap/NonHeap
      value: $3
      type: GAUGE
    - pattern: 'java.lang<name=([\s\w]+), type=GarbageCollector, key=(\w+)>(.*): (\d+)'
      name: jvm_gc_$3
      labels:
        name: $1
        key: $2
      value: $4
      type: GAUGE
    - pattern: 'java.lang<name=([\s\w]+), type=MemoryPool, key=(\w+)>(.*): (\d+)'
      name: jvm_mempool_$3
      labels:
        name: $1
        key: $2
      value: $4
      type: GAUGE
    - pattern: 'java.lang<name=([\s\w]+), type=GarbageCollector>(.*): (\d+)'
      name: jvm_gc_$2
      labels:
        name: $1
      value: $3
      type: GAUGE
    - pattern: 'java.lang<name=([\s\w]+), type=MemoryPool>(.*): (\d+)'
      name: jvm_mempool_$2
      labels:
        name: $1
      value: $3
      type: GAUGE

Does anyone have patters to collect thread and cpu metrics?

fstab commented

Maybe you don't need the jmx_exporter for that. You could just use the simpleclient_hotspot and call

new StandardExports().register();
new ThreadExports().register();

https://mvnrepository.com/artifact/io.prometheus/simpleclient_hotspot
https://github.com/prometheus/client_java/tree/master/simpleclient_hotspot

Sadly I'm trying to use this with a very legacy app that runs only on java 1.5. I need to use the jmx http agent as is. Does anyone knows how to get cpu a thread metrics using the http agent?

jmx_exporter agent is not exporting CPU metrics, can anyone suggest what need to be done.

@nareshb123 @matiasba asked for the same thing. Maybe he figured it out?

@nareshb123 @matiasba asked for the same thing. Maybe he figured it out?

Sadly I just gave up on this, there is very little documentation. I just told the dev team that they need to upgrade their java version of they want to use Prometheus.

@brian-brazil Just try to start java agent with JBoss/WildFly on Java11 and you will have much fun (spoiler: it just won't run, JBoss/WildFly won't start; and even running on Java8 is a quite a pain), so the http server and examples are useful in some cases.
You may state that there's a bug in JBoss/WildFly (and that might be a fairly right statement), but nevertheless we need monitoring data before bugs are fixed...