/o9d-observability

Opinionated Observability Extensions for .NET

Primary LanguageC#MIT LicenseMIT

Observability Icon

Observability

NuGet NuGet License

Build Coverage Status Quality Gate Status

O[pinionate]d Observability Extensions for .NET.

Quick Start

In order to make use of the Observability libraries you need to initialize the Observability Host. Currently, only ASP.NET Core hosts are supported.

Add the O9d.Observability.Hosting.AspNet package from NuGet

dotnet add package O9d.Observability.Hosting.AspNet

You can then update your Startup.cs file to initialize the host:

services.AddObservability(builder =>
{

});

Internally this will initialize an ASP.NET Core Hosted Service that keeps track of all registered instrumentation components.

To start instrumenting your application you need to add one of the relevant instrumentation packages (discussed in more detail below), for example, to add ASP.NET Core metrics (using Prometheus), add the 09d.AspNet package:

dotnet add package O9d.Metrics.AspNet

Then update your Observability startup code:

services.AddObservability(builder =>
{
    builder.AddAspNetMetrics(options => {});
});

One of the design goals of this library is that it should be as unobtrusive as possible, leveraging the built-in diagnostic and activity components of the Core CLR so that adding instrumentation doesn't interfere with other application code or middleware.

Instrumentation Libraries

ASP.NET Core Metrics

The 09d.Metrics.AspNet package adds specific Prometheus metrics that we have found to be the most useful when operationalising HTTP services in production.

After installing the Observability Hosting and ASP.NET Metrics Packages to your application, update your Startup.cs as follows:

services.AddObservability(builder =>
{
    builder.AddAspNetMetrics(options => {});
});

By default the library adds the following Prometheus metrics:

http_server_request_duration_seconds

A histogram (default) or summary that tracks the duration in seconds that HTTP requests take to process.

Labels:

Name Description Example
operation A descriptor for the operation and endpoint that was requested get_customers
status_code The status code returned by your service 200

http_server_requests_in_progress

A gauge that tracks the number of requests in progress.

Labels:

Name Description Example
operation A descriptor for the operation and endpoint that was requested get_customers

http_server_errors_total

A counter that tracks the number of HTTP requests resulting in an error.

Labels:

Name Description Example
operation A descriptor for the operation and endpoint that was requested get_customers
sli_error_type The service level indicator error type external_dependency
sli_dependency For dependency error types, the name of the causing dependency skynet

Calculating Service Availability

With these metrics we can easily calculate both internal and external service availability. To calculate our client facing availability:

Availability = successful_requests / (total_requests - client_failures)

For example:

Given 100 requests
of which
    70 returned HTTP 200
    10 returned HTTP 500 (Server Error)
    20 returned HTTP 422 (Invalid Client Request)

Availability = (100 - 30) / (100 - 20)
= 87.5%

To calculate this in Prometheus/Grafana:

(sum(rate(http_server_request_duration_seconds_count[10m])) - sum(rate(http_server_errors_total[10m]) OR on() vector(0))) / 
( 
    sum(rate(http_server_request_duration_seconds_count[10m])) - 
    sum(rate(http_server_errors_total{sli_error_type="invalid_request"}[10m]) OR on() vector(0))
)

Resolving the Operation

The default Prometheus libraries for ASP.NET are quite verbose and can result in a large number of series or high-cardinality labels.

By design this library only tracks genuine endpoints of your application since generally, metrics about non-existent endpoints offer little value (e.g. bots trying to hit /phpmyadmin). Note that a metric for unmatched paths is something we're thinking about.

By default the library uses the following approach to resolve the operation name

  1. The name of the route if set on your controller action, for example: c#
    [HttpGet("status/{code:int}", Name = "get_status")]
    
  2. Or, use a combination of the HTTP verb and route template e.g. PUT /customers/{id}

In general we recommend explicitly naming your route to avoid your metrics changing if your URI structure is updated.

Tracking Errors

By default the following status codes are determined to be an error:

  • 400 - 499 - Error Type: Invalid Request
  • >500 - Error Type: Internal

What we can't track automatically are errors that are the result of internal or external dependencies. For these you have two options:

  1. Set the SLI error using HttpContext.SetSliError(), for example:

    HttpContext.SetSliError(ErrorType.ExternalDependency, "skynet");
  2. Throw an SliException (or any derived type), for example:

    throw new SliException(ErrorType.ExternalDependency, "skynet");

Customizing Metrics

The AspNetMetricsOptions class includes a number of options to customize the metrics created by the library. Each metric listed above has an associated ConfigureX property that can be used to customize the underlying Prometheus metric configuration. For example, to set the buckets used by the Request Duration Histogram metric:

services.AddObservability(builder =>
    builder.AddAspNetMetrics(options =>
        options.ConfigureRequestDurationHistogram = histogram =>
        {
            histogram.Buckets = new[] { 0.1, 0.2, 0.5, 0.75, 1, 2 };
        }
    )
);

Using a summary instead of a histogram to to track request duration

We recommend using histograms (the default) if you are running multiple instances of your application since they can be aggregated. If you are happy with the trade-offs of using Summary metrics, you can switch the request duration metric type like so:

services.AddObservability(builder =>
    builder.AddAspNetMetrics(options =>
        options.RequestDurationMetricType = ObserverMetricType.Summary
    )
);

Grafana Dashboard

We've created a Grafana Dashboard that leverages the metrics generated by O9d.Metrics.AspNet. You can see this in action by running the examples and install it from Grafana Labs.

For the dashboard to work you should add an app label with the name of your application. This can be done by your agent or directly within your application using static labels:

Prometheus.Metrics.DefaultRegistry.SetStaticLabels(new Dictionary<string, string>
{
    { "app", "aspnet-example" },
    { "env", "prod" }
});

Extending O9d.Observability

This project was heavily inspired by the Open Telemetry Libraries for .NET.

We wanted to make it easy to plug in additional instrumentation without a lot of ceremony. Suppose you want to instrument operations in the DazzleDB .NET client. Fortunately the client already emits events to a Diagnostic Source and the Observability library makes it easy to tap into them.

Create an observer

Create a class that implements IObserver<KeyValuePair<string, object?>> to receive Diagnostic Listener events:

internal class DazzleDbMetricsObserver : IObserver<KeyValuePair<string, object?>>
{
}

Add the O9d.Observability package

dotnet add package O9d.Observability

Create an extension for Observability Builder

public static class DazzleDbObservabilityBuilderExtensions
{
    public static IObservabilityBuilder AddDazzleDbMetrics(this IObservabilityBuilder builder)
    {
        if (builder is null) throw new ArgumentNullException(nameof(builder));

        return builder.AddDiagnosticSource("DazzleDb", new DazzleDbMetricsObserver());
    }
}

The above code makes use of the AddDiagnosticSource extension to handle the boilerplate DiagnosticSource subscription logic and ensure subscribers are tracked.

Package your library and update your applications

services.AddObservability(builder =>
{
    builder.AddDazzleDbMetrics();
});