nginxinc/nginx-otel

Add client spans around calls to upstream

huggsboson opened this issue · 9 comments

Is your feature request related to a problem? Please describe

Services usually have a server span upon request entry and client span (span.kind = client) around calls to upstream.

This helps with a few things:

  1. You can diagnose whether time is being spent or errors in the service itself or its upstreams.
  2. Tracing products use this information for useful visualisations e.g. "inferred services" for services that haven't adopted tracing it still shows up as a dependency.

Describe the solution you'd like

  1. Output a new span on each calls to upstream
  2. Allow adding attributes/baggage only to that client span

Describe alternatives you've considered

Putting a proxy between nginx and its upstreams.

Nginx has a set of variables associated used to measure time: $request_time, $session_time, $connection_time, $upstream_connect_time, $upstream_first_byte_time, $upstream_header_time, $upstream_response_time, $upstream_queue_time, $upstream_session_time. One can use these variables in spans to get visibility into that.

The $request_time variable is a non-cacheable variable. This means that nginx computes it every time you call it. And this can be used in a very powerful way. If you call the $request_time variable at the right moment during the request processing (and you can do it during different moments), you can get additional measurements.

For example, consider the following configuration:

server {
...
    map $request_time $proxy_delay {
        default $request_time;
    }

    map $proxy_delay $stub {
        default '';
    }
...
    location / {
        proxy_pass http://u/;
        proxy_set_header x-stub $stub;
    }
}

In the nginx configuration provided, we combine the non-cacheable variable $request_time with cacheable variables $proxy_delay and $stub from the map statement. We calculate $request_time only once during the calculation of $stub variable for proxy_set_header directive when we form a request to upstream. As we need to cache $request_time value to reuse during the log phase (if you refer it nginx-otel span configuration), we wrap it with the map from $proxy_delay. map variables are cacheable, so the value of $proxy_delay is cached when we call it the first time (in our example, we do it when we reached the point where we prepare a request to be proxied to upstream). In order to avoid sending unnecessary header “x-stub”, we use another map wrapper that always maps to the empty value (‘’). Proxy_set_header directive does not add an empty header to a proxied request.

Using such patter one can get a rich set of time measurements. There are some internals though that useful to know. For example, nginx computes time once per even loop pass. So this is not an exact precision. But in 99.99% cases it should be good enough.

Will this solve the use cases your shared?

It sounds like a possible workaround, but creating client spans around your upstream calls is the standard way of doing these things and like I mentioned a ton of observability tools have built useful visualizations around that. Is there context/a reason creating extra spans at the appropriate moments (especially client spans) isn't a good fit for nginx?

This is reasonable request, but Nginx doesn't provide clear "hook" points around upstream calls in all cases. The only option I can think of so far is to introduce new otel_trace_upstream directive in upstream block, but such directive will be required in every upstream block and it won't cover cases without explicit upstream...
PR linked above seems to be adding client spans inside server spans, not around calls to upstream, so both of these spans will have the same timings.

I agree that the ability to create client spans would be a very useful feature.

Many experience around OpenTelemetry traces are built on the assumption that one service calling another service synchronously is modeled by a parent client span and a child server span. HTTP client instrumentation creates a client span, HTTP server framework instrumentations create a server span (client -> server).

If you now have nginx between the instrumented HTTP client and the instrumented HTTP server framework, you end up with a span structure client -> server -> server, which for most tools and backends doesn't make sense, and can break experiences like service graphs. A span structure like client -> server (nginx) -> client (nginx) -> server allows for a much better experience, and cleanly integrate nginx as an intermediate service.

PR linked above seems to be adding client spans inside server spans, not around calls to upstream, so both of these spans will have the same timings.

It would be great to have correct timings, however, given the status quo having client and server spans with the same timings are the lesser evil (as opposed to the client -> server -> server span structure).

I found this while searching for information about adding spans to subrequests.

In my situation: frontend -> nginx -> 2 subrequests -> proxy to backend

@bmerigan, subrequests tracing might deserve its own issue. Also, can you share config example so it's clearer what those subrequests are used for (e.g. authentication)?

@p-pautov yes it is indeed for authentication (oauth2) and authorization (opa). I'd like those 2 sub_requests to get recorded as well.

Just to demonstrate this problem, here is an example of how Azure Monitor is connecting ingress-nginx with my backend:

image

As you can see, frontend (yasm-frontend) is talking to ingress-nginx, but the connection from ingress-nginx to backend (yasm-backend) is not being drawn.

Same for Grafana/Tempo:

image

and backend separately:

image