micrometer-metrics/context-propagation

Possibility for ContextExecutorService to create new spans

Closed this issue · 10 comments

In Sleuth, the implemented executors produced a new span for each submitted task. In the case of parallel executions, each thread had a different span, so the MDC worked well in each thread.

Using the following snippets, the existing span is shared between threads, so the MDC (which is naturally bound to a thread) might behave strangely.

private ExecutorService produceExecutor(int threads) {
    ContextRegistry.getInstance().registerThreadLocalAccessor(new ObservationAwareSpanThreadLocalAccessor(this.tracer)); //issue #120 
    return ContextExecutorService.wrap(
        Executors.newFixedThreadPool(threads), ContextSnapshot::captureAll);
}
@Bean
public TaskDecorator parallelTaskDecorator(Tracer tracer) {
	return (runnable -> {
		ContextRegistry.getInstance().registerThreadLocalAccessor(new ObservationAwareSpanThreadLocalAccessor(tracer)); 
		return ContextSnapshot.captureAll().wrap(runnable);
	});
}

For example, when decorating the MDC from Brave baggages, if the same value is updated in two different threads, only one MDC will have the correct value, as the MDC is updated only when needed:

  • "myBaggage=foo" is put in the Brave baggages by thread-1 -> MDC bound to thread-1 is updated, and MDC bound to thread-2 is not (correctly);
  • then "myBaggage=foo" is put in the Brave baggages by thread-2 -> MDC bound to thread-2 is still not updated: the decorator believes there is no need to update the MDC, as the new value is equal to the old one... and the old value was already written in "the" MDC.

While it might be useful to share the span between all the tasks in certain cases, I feel the default should be like in the old Sleuth implementation, to let all the existing ThreadLocal logic work as before. If this proposed default behavior is not desiderable, an implementation with such behavior should be provided by the library. I think #191 is not relevant to this issue.

The context propagation library is about propagating context. It doesn't have a notion of creation of new "objects" such as e.g. spans. In Tracing we don't want to create new spans out of the box because we don't know how to properly name them etc.

I think it's up to the user to create new spans if they want to in the new threads.

You are right saying that context-propagation is not the right place for such a feature (even if ContextExecutorService is here - should I reopen this issue in tracing?), but I think we are missing a powerful feature from Sleuth: it was easy to propagate the context and use the MDC. Something like TraceableExecutorService(BeanFactory beanFactory, ExecutorService delegate) (the span name parameter was nullable) is easy to use as is intuitive to build such object (beanFactory can be injected and delegate is a plain executor) and use it properly.

Consider now the current Micrometer situation to decorate the MDC from baggages:

ExecutorService delegate = buildMyCustomExecutorThatHandlesSpans();
ContextRegistry.getInstance().registerThreadLocalAccessor(new ObservationAwareSpanThreadLocalAccessor(this.tracer));
ContextExecutorService.wrap(delegate, ContextSnapshot::captureAll);

The custom-executor-that-handles-spans should be no more custom than any Sleuth executor. Moreover, while registering the ThreadLocalAccessor manually can be a powerful and useful feature, in the default case might feel just as boilerplate or confusing. Should each project reimplement the old Sleuth executors? It is no trivial task to properly code such executor and something well tested by experts and the community will be extremely better than non-production-ready implementations scattered around.

That Sleuth feature wasn't actually a good idea IMO. Maybe you should create your own wrapper around the executor that will automatically create a span and put it in scope?

Sure, there might be a better way to do it, but currently MDC decorators (and, by extension, logging) are not properly working while using executors. I don't think that I'm the only crazy person that wants to log during an async execution :) Are there plans to handle such issue in a different manner?

I don't understand, why will they not work? Micrometer Tracing works with Brave and OTel and the context propagation with them is working fine.

No, it is not working, as I explained in the first comment. Look for example the logic in Brave CorrelationScopeDecorator:

  • suppose myField has value A in the original context/MDC
  • we submit two tasks to an executor that uses two more threads (thread-1 and thread-2)
  • we update myField in thread-1, baggage and MDC is updated
  • we update myField in thread-2 with the same value, baggage is updated (well there is nothing to update), MDC is not updated (because the decorator logic assumes there is nothing to update)
operation brave baggage original MDC thread-1 MDC thread-2 MDC
init myField = A myField = A myField = A myField = A
update myField = B in thread-1 myField = B myField = A myField = B myField = A
update myField = B in thread-2 myField = B myField = A myField = B myField = A

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

I'm a bit confused about this issue.
We had a somewhat similar issue where we decided not to instrument these components for now: micrometer-metrics/micrometer#3881
You can se how to do it on your own in our tests: https://github.com/micrometer-metrics/micrometer/blob/09b81f4adc4e8944bb8ed843756850721cc17fa2/micrometer-observation/src/test/java/io/micrometer/observation/CurrentObservationTest.java#L57-L70

If baggage propagation does not work, please take a look at this issue first: micrometer-metrics/micrometer#3908 and if it does not help, could you please open a new issue in Micrometer Tracing with a minimal sample project to reproduce the issue so we can more easily investigate and ensure any fix is working properly for your use case?

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.