Azure/azure-functions-dotnet-worker

Don't create fake activity on worker with the same Id as the host one

lmolkova opened this issue · 7 comments

activity.SetId(context.TraceContext.TraceParent);

The code above creates a 'fake' activity on the worker that imitates host activity and allows to omit client/server gRPC spans (which are probably too verbose to be on by default).

This behavior is the best that's possible today dues to limitations: open-telemetry/opentelemetry-dotnet#5286 and the underlying dotnet/runtime#86966. Which get no traction because of open-telemetry/opentelemetry-specification#4271

Unfortunately it leads to fun issues such as dotnet/aspire#6441.

If someone listens to all activity sources (AddSource("*")) , both host and worker activity will be generated. They have identical ids, that may or may not be supported by the tooling.


I don't see how anything can be solved on the Functions side until open-telemetry/opentelemetry-specification#4271 is done. I'll try to update the spec.

@lmolkova do you know if there is something we can do in the meantime until OTel has its spec? Can we add a tag to our span to tell everyone to ignore it?

@jviau

I can only think of giving users ability to opt out of InvokeFunctionAsync activity. The way it works today is that it's created no matter what (even if there is no listener) and you get activity events if you subscribe to empty source (with AddSource("*")).

According to #2733 it was a breaking change, but it maybe even more breaking to change it back now after several months. or make it opt-in (by emitting it in conventional way via ActivitySource with a name people can subscribe to).

It seems logical though to at least provide a config option to turn it off given that it has all these side-effects (#2875, Azure/azure-functions-host#10641).

I'd suggest one of the following:

  1. This activity should be created through ActivitySource with a name like Microsoft.Azure.Functions.Worker. UseFunctionsWorkerDefaults would add corresponding source, but provide an option to disable it

OR

  1. User-defined middleware should be able to run before this call and, if it creates an activity, this one would not be created -

Yet another time I wish OTel had an ability to remove the source (open-telemetry/opentelemetry-dotnet#4371) that's been added and then we would not need to expose yet another config option.

jviau commented

@lmolkova I have a different idea: what if we just don't call stop on this activity (or dispose it)? No stop, no OnEnd event, no activity being exported.

It'd still have all the context-propagation side-effects (i.e. it would be on Activity.Current)

jviau commented

That is the point of it though - we want to make sure the span the host will be emitting is available in the worker for parent/child correlation.

That is the point of it though - we want to make sure the span the host will be emitting is available in the worker for parent/child correlation.

It tries to, but it seems to do something weird: #2875, Azure/azure-functions-host#10641

I think what you're saying here is that you want Activity.Current to be populated with a new span with the same span id and trace id as the span from the host.

This isn't how tracing should work. Each operation is a different span, so the worker should be creating a new span, that we can listen to, in the normal way.

Right now we can listen to is using .AddLegacySource("InvokeFunctionAsync") which is the old way and relies on you know the Operation Name of the activity..
The Activity "Should" use an ActivitySource here, which affords some nicer ways to deal with it, however, it runs the risk that Activity.Current is null.

The first thing for me would be allow the same previous functionality here, which is that Activity.Current is null, using some kind of flag.

Going forward, I should be able to listen to this Activity, which means giving it an AcitivtySource so I can enable it as Liudmila suggested.

The wider problem is that the host span isn't accessible outside of Flex Consumption plans, which means we'll still have broken/unusable tracing in Azure Functions, which would make this not a priority.