/opentelemetry-clojure

[WIP] Clojure wrapper of opentelemetry-java

Primary LanguageClojure

opentelemetry-clojure WIP

Status: abandonned unfinished, use steffan-westcott/clj-otel instead.

Clojure wrapper of opentelemetry-java.

Goals:

  • Thin wrapper around java classes
  • minimal overhead over the java implementation, since performance is a stated goal by OpenTelemetry.
  • TODO find upstream OpenTelemetry wording, but basically: say this wrapper can be used by library authors as the upstream java lib expects it. it does so by not including the SDK deps, only the API ones. meaning a clojure lib that instruments its code with opentelemetry-clojure will not forces its users to use opentelemetry at all.
    • OpenTelemetry separates API lib from SDK lib. Goal is to be able to use API code anywhere, but without any SDK in classpath it's almost noop. Thus, libraries can instrument themselves, without forcing their consumers to use OpenTelemetry.
  • Keep the original naming of things, like Context, Resource, Baggage, Span, etc.

Non-goals:

  • provide opinionated solution concerning in-process context propagation between threads, instead provide several solutions and document their caveats.
  • TODO

Usage

Run examples in a REPL:

clj -A:example

Jetty example:

  1. cd examples; docker-compose up
  2. clj -A:example

Run tests:

bin/kaocha

Documentation (WIP, unstructured)

Span

  • kind: list of possible kinds, if not good one fallback on internal, following Java implementation
  • usage to build a spanbuilder and start it elsewhere

Get a tracer instance

First, you need an instance of OpenTelemetrySdk:

  • get GlobalOpenTelemetry or DefaultOpenTelemetry
  • if you want to configure:
    • either include autoconfigure artifact
      • link to spec of autoconfiguration
      • and then GlobalOpenTelemetry/get ...
    • either build manually the 'OpenTelemetrySdk' instance:
      • just build -> then keep the value floating
      • if you register global, then core/tracer
        • shall we memoize tracer instance or not ?

TODO: flowchart to explain the decisions ? could be way simpler than words (agent vs no agent, auto instrumentation vs no autoinstrumentation, clj library author)

  • About don't run twice instanciation: open-telemetry/opentelemetry-java#3717
  • https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#forceflush-1
  • Note that there is no need to "set" a tracer by name before getting it. The getTracer method always returns a handle to the same tracing client. The name you provide is to help identify which component generated which spans, and to potentially disable tracing for individual components. We recommend calling getTracer once per component during initialization and retaining a handle to the tracer, rather than calling getTracer repeatedly. This should be configured as early as possible in the entry point of your application. Keep in mind, this builder is not required if the agent is in use.

Thread safety

Cross-cutting concern

Artifacts etc

  • opentelemetry-java provides the otel API and the default SDK implementation, along with various plugins like exporters, span processors, etc, but no actual instrumentation of anything useful
  • opentelemetry-java-instrumentation has 2 things. One: some “library” instrumentation which you can put on your classpath and just use as-is. Two: A full-featured javaagent that does auto-instrumentation via bytecode manipulation. (edited)

To use or not the auto-instrumentation agent

When using Agent, you get the following constraints / caveats:

  • "direct usage of the OpenTelemetry SDK is not supported when running agent" TODO: source ~~- Span implementation when using autoinstrumentation java agent differs from the one from the SDK
    • TODO: ask about this on cncf slack, Span doesn't implement Context interface in agent but it does in api~~
  • tracer is built by the agent, thus need to use auto configuration or ?
  • Javaagent version vs SDK version, see this slack message:

    We support all OTel APIs up to the version of the javaagent, but not necessarily newer. The assumption is it’s generally at least as easy to update the javaagent version as an app. And while not impossible for all cases, it’s hard to guarantee compatibility with something that doesn’t exist yet

  • open-telemetry/opentelemetry-java-instrumentation#1926

How to leverage supported libraries, frameworks etc without running the agent ?

Understand which ContextStorage provider is used

java agent:

  • with -> see how this works
  • without: default to ThreadLocalContextStorage

Attributes

OTEL documentation:

Implementation choice:

Resource

Baggage

  • link to spec https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/baggage/api.md

  • metadata field:

    Metadata Optional metadata associated with the name-value pair. This should be an opaque wrapper for a string with no semantic meaning. Left opaque to allow for future functionality.

  • choices:

    • there is fromContext which returns an empty baggage if none existing
    • there is current which returns an empty baggage if none existing
    • there is fromcontextornull which returns nil or a baggage
    • -> we are more used to this, nil if value doesn't exist
    • -> so propose only this: from-context [] -> current or nul [context] -> fromcontextornull

Operations:

  • Extract the Baggage from a Context instance
  • Insert the Baggage to a Context instance

Datafy

  • see opentelemetry.datafy
  • needs to be required if you want to use it
  • rely on opentelemetry-sdk as well, some API facades don't allow to introspect data

Concurrency primitives

future

Solution 1: implicitely convey the Context to the Executor Downsides:

  • can be intrusive, and it wraps Agent executor as well

Solution 2: wrap in a lexical scope

core.async

We have to differentiate 2 use cases:

  1. We use a go block or a thread: wrap in a lexical scope
  • warning about go block and parking/resume on another thread -> macro to help maintain the correct thread local storage
  1. We don't have it, need to convey value in channel: todo example of code with aleph for example

Roadmap :

  1. Wrapper for OTEL Tracing API and with unopinionated solutions for Context conveyance accross threads in usual Clojure stack:
  • future, core.async go and thread, core.async channels, manifold, CompletableFuture etc
  1. Provide extensive examples and documentation with usual Clojure stack:
  • jetty, netty, logback config
  1. Load test example app with correctness validation (convey trace id in the request body to assert down the execution path correct spans) + Profiling, etc

TODO :