medusa-ui/medusa

Add OpenTelemetry

kevindeyne opened this issue · 2 comments

We have a custom setup for how frontends work on top of WebFlux (already not well supported by many tracer agents). It would be beneficial if we ourselves attached OpenTelemetry spans to important flows.

Read up more: https://www.baeldung.com/spring-boot-opentelemetry-setup
and https://opentelemetry.io/docs/

We can test this out in Sentry.io; I have Sentry set up for our showcase already - but you'll see that there's only limited data made available. Essentially, only the initial calls get traced.

We mostly just need traces/spans defined, I think. The rest of sentry.io/any other compatible collector others use can take care of the rest. This can define more on how to do that:
https://opentelemetry.io/docs/instrumentation/java/manual/

I was able to set up a local instance of this. Not sure if it's worth including everything. For what's it's worth, this was my process:

Setup Jaeger locally via docker:

docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 jaegertracing/all-in-one:1.46

Add dependencies:

        <!-- tracing -->
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-api</artifactId>
        </dependency>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-sdk</artifactId>
        </dependency>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-exporter-otlp</artifactId>
        </dependency>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-semconv</artifactId>
            <version>1.27.0-alpha</version>
        </dependency>

Then build a tracer configuration:

@Configuration
public class TracingConfig {

    final OpenTelemetry openTelemetry;

    public TracingConfig(@Value("${spring.application.name:medusa-ui}") String applicationName) {
        Resource resource = Resource.getDefault()
                .merge(Resource.create(Attributes.of(ResourceAttributes.SERVICE_NAME, applicationName)));

        SdkTracerProvider sdkTracerProvider = SdkTracerProvider.builder()
                .addSpanProcessor(BatchSpanProcessor.builder(OtlpGrpcSpanExporter.builder().build()).build())
                .setResource(resource)
                .build();

        SdkMeterProvider sdkMeterProvider = SdkMeterProvider.builder()
                .registerMetricReader(PeriodicMetricReader.builder(OtlpGrpcMetricExporter.builder().build()).build())
                .setResource(resource)
                .build();

        openTelemetry = OpenTelemetrySdk.builder()
                .setTracerProvider(sdkTracerProvider)
                .setMeterProvider(sdkMeterProvider)
                .setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance()))
                .buildAndRegisterGlobal();
    }

    @Bean
    public Tracer buildTracer() {
        return openTelemetry.getTracer("medusa-ui", "1.0.0");
    }
}

Autowire the tracer in, and then create spans everywhere. Example:

        Span span = tracer.spanBuilder("ActionHandler.execute")
                .startSpan();
        span.setAttribute("bean", bean.getClass().getName());
        try(Scope scope = span.makeCurrent()) {
         ... 
        } finally {
            span.end();
        }

Now: In addition, I added lots of spans to see where the main performance issues lie with Medusa.

The weakest point is clearly the DiffEngine.
An action call can easily take 125ms to re-render. But the rendering itself only takes microseconds, so Thymeleaf itself is incredibly efficient. What takes times is the diffing.
so of the 125ms, 106 ms is (84%):
- 58ms = HTMLLayerBuildupEngineLogic.recursive()
- 48ms = HTMLLayerBuildupEngineLogic.initialParse()
We then lose a little bit more time at Engine.calculateForLayer, up to 10ms total (8%).
I've added an issue for this: medusa-ui/diff-engine#24