/prometheus-net.DotNetRuntime

Exposes .NET core runtime metrics (GC, JIT, lock contention, thread pool) using the prometheus-net package

Primary LanguageC#MIT LicenseMIT

prometheus-net.DotNetMetrics

A plugin for the prometheus-net package, exposing .NET core runtime metrics including:

  • Garbage collection collection frequencies and timings by generation/ type, pause timings and GC CPU consumption ratio
  • Heap size by generation
  • Bytes allocated by small/ large object heap
  • JIT compilations and JIT CPU consumption ratio
  • Thread pool size, scheduling delays and reasons for growing/ shrinking
  • Lock contention
  • Exceptions thrown, broken down by type

These metrics are essential for understanding the performance of any non-trivial application. Even if your application is well instrumented, you're only getting half the story- what the runtime is doing completes the picture.

Using this package

Requirements

  • .NET 5.0+ recommended, .NET core 3.1+ is supported
  • The prometheus-net package

Install it

The package can be installed from nuget:

dotnet add package prometheus-net.DotNetRuntime

Start collecting metrics

You can start metric collection with:

IDisposable collector = DotNetRuntimeStatsBuilder.Default().StartCollecting()

You can customize the types of .NET metrics collected via the Customize method:

IDisposable collector = DotNetRuntimeStatsBuilder
	.Customize()
	.WithContentionStats()
	.WithJitStats()
	.WithThreadPoolStats()
	.WithGcStats()
	.WithExceptionStats()
	.StartCollecting();

Once the collector is registered, you should see metrics prefixed with dotnet_ visible in your metric output (make sure you are exporting your metrics).

Choosing a CaptureLevel

By default the library will default generate metrics based on event counters. This allows for basic instrumentation of applications with very little performance overhead.

You can enable higher-fidelity metrics by providing a custom CaptureLevel, e.g:

DotNetRuntimeStatsBuilder
	.Customize()
	.WithGcStats(CaptureLevel.Informational)
	.WithExceptionStats(CaptureLevel.Errors)
	...

Most builder methods allow the passing of a custom CaptureLevel- see the documentation on exposed metrics for more information.

Performance impact of CaptureLevel.Errors+

The harder you work the .NET core runtime, the more events it generates. Event generation and processing costs can stack up, especially around these types of events:

  • JIT stats: each method compiled by the JIT compiler emits two events. Most JIT compilation is performed at startup and depending on the size of your application, this could impact your startup performance.
  • GC stats with CaptureLevel.Verbose: every 100KB of allocations, an event is emitted. If you are consistently allocating memory at a rate > 1GB/sec, you might like to disable GC stats.
  • Exception stats with CaptureLevel.Errors: for every exception throw, an event is generated.

Recycling collectors

There have been long-running performance issues since .NET core 3.1 that could see CPU consumption grow over time when long-running trace sessions are used. While many of the performance issues have been addressed now in .NET 6.0, a workaround was identified: stopping and starting (AKA recycling) collectors periodically helped reduce CPU consumption:

IDisposable collector = DotNetRuntimeStatsBuilder.Default()
	// Recycles all collectors once every day
	.RecycleCollectorsEvery(TimeSpan.FromDays(1))
	.StartCollecting()

While this has been observed to reduce CPU consumption this technique has been identified as a possible culprit that can lead to application instability.

Behaviour on different runtime versions is:

  • .NET core 3.1: recycling verified to cause massive instability, cannot enable recycling.
  • .NET 5.0: recycling verified to be beneficial, recycling every day enabled by default.
  • .NET 6.0+: recycling verified to be less necesarry due to long-standing issues being addressed although some users report recycling to be beneficial, disabled by default but recycling can be enabled.

TLDR: If you observe increasing CPU over time, try enabling recycling. If you see unexpected crashes after using this application, try disabling recycling.

Examples

An example docker-compose stack is available in the examples/ folder. Start it with:

docker-compose up -d

You can then visit http://localhost:3000 to view metrics being generated by a sample application.

Grafana dashboard

The metrics exposed can drive a rich dashboard, giving you a graphical insight into the performance of your application ( exported dashboard available here):

Grafana dashboard sample

Further reading