AgiFlow/agiflow-sdks

Incompatibilities with OpenTelemetry LLM semantics pending release

Closed this issue ยท 8 comments

I work on the OpenTelemetry LLM semantics SIG, and did an evaluation of the SDK based on the following sample code and what the semantics pending release 1.27.0 will define.

Note: I'm doing this unsolicited on all the various python instrumentation for openai, so this is not a specific call out that AGIFlow is notably different here. I wanted to warn you about some drift and ideally you'll be in a position to adjust once the release occurs, or clarify if that's not a goal. I would welcome you to join the #otel-llm-semconv-wg slack and any SIG meetings if you find this relevant!

Sample code

import os
from agiflow import Agiflow
from openai import OpenAI
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

# Initialize otel exporter and AGIFlow instrumentation
app_name = "agiflow-python-ollama"
otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT", "http://localhost:4318/v1/traces")
otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint)
Agiflow.init(app_name=app_name, exporter=otlp_exporter)

def main():
    ollama_host = os.getenv('OLLAMA_HOST', 'localhost')
    # Use the OpenAI endpoint, not the Ollama API.
    base_url = 'http://' + ollama_host + ':11434/v1'
    client = OpenAI(base_url=base_url, api_key='unused')
    messages = [
      {
        'role': 'user',
        'content': '<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>',
      },
    ]
    chat_completion = client.chat.completions.create(model='codegemma:2b-code', messages=messages)
    print(chat_completion.choices[0].message.content)

if __name__ == "__main__":
    main()

Evaluation

Semantic evaluation on spans.

compatible:

  • kind=Client

missing:

  • attributes['gen_ai.operation.name']='chat'
  • attributes['gen_ai.system]='ollama'

incompatible:

  • name=Completions (should be 'chat codegemma:2b-code')
  • attributes['lllm.type']='Chat' (should be 'gen_ai.operation.name' and lowercase)
  • attributes['llm.prompts']='[{"role": "user", "content": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>"}]' (should be the event attribute 'gen_ai.prompt')
  • attributes['llm.model']='codegemma:2b-code' (should be 'gen_ai.request.model')
  • attributes['llm.responses']='[{"content": "print("Hello, world!")", "role": "assistant"}]' (should be the event attribute 'gen_ai.completion')
  • attributes['llm.token.counts']='{"prompt_tokens": 24, "completion_tokens": 12, "total_tokens": 36}' (should be split into 'gen_ai.usage.input_tokens' and 'gen_ai.usage.output_tokens')

not yet defined in the standard:

  • attributes['openai.api_base']='http://localhost:11434/v1/'
  • attributes['llm.api']='/chat/completions'
  • attributes['llm.system.fingerprint']='fp_ollama'

defined by other semantics:

vendor specific:

  • attributes['agiflow.sdk.name']='agiflow-python-sdk'
  • attributes['agiflow.sdk.version']='0.0.23'
  • attributes['agiflow.service.name']='OpenAI'
  • attributes['agiflow.service.type']='LLM'
  • attributes['agiflow.service.version']='1.37.0'

Semantic evaluation on metrics:

N/A as no metrics are currently recorded

Example collector log

otel-collector      | 2024-07-24T05:08:59.563Z  info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
otel-collector      | 2024-07-24T05:08:59.563Z  info    ResourceSpans #0
otel-collector      | Resource SchemaURL: 
otel-collector      | Resource attributes:
otel-collector      |      -> service.name: Str(agiflow-python-ollama)
otel-collector      |      -> service.version: Str()
otel-collector      |      -> telemetry.sdk.name: Str(AGIFlow)
otel-collector      |      -> telemetry.sdk.version: Str(0.0.23)
otel-collector      | ScopeSpans #0
otel-collector      | ScopeSpans SchemaURL: 
otel-collector      | InstrumentationScope agiflow.opentelemetry.instrumentation.openai.instrumentation 0.0.23
otel-collector      | Span #0
otel-collector      |     Trace ID       : 3d39854a707e30493a3400ba75d0cfc0
otel-collector      |     Parent ID      : 
otel-collector      |     ID             : de037108c788013f
otel-collector      |     Name           : Completions
otel-collector      |     Kind           : Client
otel-collector      |     Start time     : 2024-07-24 05:08:58.820558 +0000 UTC
otel-collector      |     End time       : 2024-07-24 05:08:59.4852 +0000 UTC
otel-collector      |     Status code    : Ok
otel-collector      |     Status message : 
otel-collector      | Attributes:
otel-collector      |      -> agiflow.sdk.name: Str(agiflow-python-sdk)
otel-collector      |      -> agiflow.sdk.version: Str(0.0.23)
otel-collector      |      -> agiflow.service.name: Str(OpenAI)
otel-collector      |      -> agiflow.service.type: Str(LLM)
otel-collector      |      -> agiflow.service.version: Str(1.37.0)
otel-collector      |      -> openai.api_base: Str(http://localhost:11434/v1/)
otel-collector      |      -> url.full: Str(http://localhost:11434/v1/)
otel-collector      |      -> llm.api: Str(/chat/completions)
otel-collector      |      -> llm.type: Str(Chat)
otel-collector      |      -> llm.prompts: Str([{"role": "user", "content": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>"}])
otel-collector      |      -> llm.model: Str(codegemma:2b-code)
otel-collector      |      -> llm.responses: Str([{"content": "print(\"Hello, world!\")", "role": "assistant"}])
otel-collector      |      -> llm.system.fingerprint: Str(fp_ollama)
otel-collector      |      -> llm.token.counts: Str({"prompt_tokens": 24, "completion_tokens": 12, "total_tokens": 36})
otel-collector      |   {"kind": "exporter", "data_type": "traces", "name": "debug"}

Thanks @codefromthecrypt , really appreciate your time running evaluation on agiflow-sdk. It's definitely our goal to keep the telemetry adhere to standard. Thanks for pointing us to the right direction, will get the next few release aligned with the semantic release.

Hi @codefromthecrypt , I've created a PR that fixes the incompatibility with GenAI semconv. Would you mind giving this branch a quick test or let me know how to run the check to make it easier?
Also I'm a bit confused which identifier should be given to gen_ai.system, is it bounded to vendor name or library name? And should it be added to API span only?

sorry about missing this

for the span, this is the logical span representing say an openai call. Ack that there is an http call underneath the openai library abstraction. Right now, I didn't notice any subspans. So, basically the span representing the library call. If recently you also add http child span, that's cool, just the spec is about the application layer one.

for gen_ai.system, the docs currently have this (I'll add ollama at some point soon)

gen_ai.system has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description
anthropic Anthropic
cohere Cohere
openai OpenAI
vertex_ai Vertex AI
--

For testing I've been using the code above pasted into the description. As I'm not sure how to add a pip dep on a branch, you could either run the code and paste collector output, or tell me how to use your branch. I use this pipfile

url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
agiflow-sdk = "*"
openai = "*"

[dev-packages]

[requires]
python_version = "3.12"

Thanks @codefromthecrypt , that sounds great to me.

Automatic http tracing is currently not support as we're having lots of empty trace on Azure function. Currently we support automatic traces only on LLM libraries; customers can add http traces via extra_instrumentations arguments when initialize the library.

I've added sample app to app/agiflow-sdk-samples with the script you provided. Hope that help!

Also notice from the working group messages, I think for now will leave prompt/completion captured on gen_ai.promp and gen_ai.completion and will add events support in another release.

thanks I'll follow-up more here, but you may want to look at open-telemetry/semantic-conventions#1315 (comment) I don't remember if you had already. Possibly you can comment your experience on this topic even if it doesn't end up being about internal vs client span kind

Thanks and personally done here until something else. Your PR is very close.


made a comment in that PR, noting your exception on the span events -> attributes part. this isn't different than openllmetry who also don't follow span events at the moment, except choices of how to represent attributes

This mainly would impact backend portability as when key data is done differently, it is hard for folks to make portable visualization or analysis tools. Event api could land soon, but it also could be a very long way away, and even longer for all backends to use it. So, basically this lack of portability will last at least that long.

I would expect that knowing this regardless of what the spec says, even when (log) event api exists, instrumentation might have a config toggle to use span events. I would bet $5 but not the house ;)

Anyway, between now and then, those really wanting to normalize on this point could rewrite the data in a custom exporter or in the collector, maybe transformprocessor, knowing the data layout basically.

Yes, I've updated to support span events by now. Checked our backend code, should be simple to support span events.

I will checkout the github issue between INTERNAL and CLIENT span soon, thanks for sharing that.

@codefromthecrypt , thanks again for your help! agiflow-sdk v0.0.24 is released with gen_ai semconv fixes.