Incompatibilities with OpenTelemetry LLM semantics pending release
Closed this issue ยท 8 comments
I work on the OpenTelemetry LLM semantics SIG, and did an evaluation of the SDK based on the following sample code and what the semantics pending release 1.27.0 will define.
Note: I'm doing this unsolicited on all the various python instrumentation for openai, so this is not a specific call out that AGIFlow is notably different here. I wanted to warn you about some drift and ideally you'll be in a position to adjust once the release occurs, or clarify if that's not a goal. I would welcome you to join the #otel-llm-semconv-wg slack and any SIG meetings if you find this relevant!
Sample code
import os
from agiflow import Agiflow
from openai import OpenAI
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
# Initialize otel exporter and AGIFlow instrumentation
app_name = "agiflow-python-ollama"
otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT", "http://localhost:4318/v1/traces")
otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint)
Agiflow.init(app_name=app_name, exporter=otlp_exporter)
def main():
ollama_host = os.getenv('OLLAMA_HOST', 'localhost')
# Use the OpenAI endpoint, not the Ollama API.
base_url = 'http://' + ollama_host + ':11434/v1'
client = OpenAI(base_url=base_url, api_key='unused')
messages = [
{
'role': 'user',
'content': '<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>',
},
]
chat_completion = client.chat.completions.create(model='codegemma:2b-code', messages=messages)
print(chat_completion.choices[0].message.content)
if __name__ == "__main__":
main()
Evaluation
Semantic evaluation on spans.
compatible:
- kind=Client
missing:
- attributes['gen_ai.operation.name']='chat'
- attributes['gen_ai.system]='ollama'
incompatible:
- name=Completions (should be 'chat codegemma:2b-code')
- attributes['lllm.type']='Chat' (should be 'gen_ai.operation.name' and lowercase)
- attributes['llm.prompts']='[{"role": "user", "content": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>"}]' (should be the event attribute 'gen_ai.prompt')
- attributes['llm.model']='codegemma:2b-code' (should be 'gen_ai.request.model')
- attributes['llm.responses']='[{"content": "print("Hello, world!")", "role": "assistant"}]' (should be the event attribute 'gen_ai.completion')
- attributes['llm.token.counts']='{"prompt_tokens": 24, "completion_tokens": 12, "total_tokens": 36}' (should be split into 'gen_ai.usage.input_tokens' and 'gen_ai.usage.output_tokens')
not yet defined in the standard:
- attributes['openai.api_base']='http://localhost:11434/v1/'
- attributes['llm.api']='/chat/completions'
- attributes['llm.system.fingerprint']='fp_ollama'
defined by other semantics:
- attributes['url.full']='http://localhost:11434/v1/'
vendor specific:
- attributes['agiflow.sdk.name']='agiflow-python-sdk'
- attributes['agiflow.sdk.version']='0.0.23'
- attributes['agiflow.service.name']='OpenAI'
- attributes['agiflow.service.type']='LLM'
- attributes['agiflow.service.version']='1.37.0'
Semantic evaluation on metrics:
N/A as no metrics are currently recorded
Example collector log
otel-collector | 2024-07-24T05:08:59.563Z info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
otel-collector | 2024-07-24T05:08:59.563Z info ResourceSpans #0
otel-collector | Resource SchemaURL:
otel-collector | Resource attributes:
otel-collector | -> service.name: Str(agiflow-python-ollama)
otel-collector | -> service.version: Str()
otel-collector | -> telemetry.sdk.name: Str(AGIFlow)
otel-collector | -> telemetry.sdk.version: Str(0.0.23)
otel-collector | ScopeSpans #0
otel-collector | ScopeSpans SchemaURL:
otel-collector | InstrumentationScope agiflow.opentelemetry.instrumentation.openai.instrumentation 0.0.23
otel-collector | Span #0
otel-collector | Trace ID : 3d39854a707e30493a3400ba75d0cfc0
otel-collector | Parent ID :
otel-collector | ID : de037108c788013f
otel-collector | Name : Completions
otel-collector | Kind : Client
otel-collector | Start time : 2024-07-24 05:08:58.820558 +0000 UTC
otel-collector | End time : 2024-07-24 05:08:59.4852 +0000 UTC
otel-collector | Status code : Ok
otel-collector | Status message :
otel-collector | Attributes:
otel-collector | -> agiflow.sdk.name: Str(agiflow-python-sdk)
otel-collector | -> agiflow.sdk.version: Str(0.0.23)
otel-collector | -> agiflow.service.name: Str(OpenAI)
otel-collector | -> agiflow.service.type: Str(LLM)
otel-collector | -> agiflow.service.version: Str(1.37.0)
otel-collector | -> openai.api_base: Str(http://localhost:11434/v1/)
otel-collector | -> url.full: Str(http://localhost:11434/v1/)
otel-collector | -> llm.api: Str(/chat/completions)
otel-collector | -> llm.type: Str(Chat)
otel-collector | -> llm.prompts: Str([{"role": "user", "content": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>"}])
otel-collector | -> llm.model: Str(codegemma:2b-code)
otel-collector | -> llm.responses: Str([{"content": "print(\"Hello, world!\")", "role": "assistant"}])
otel-collector | -> llm.system.fingerprint: Str(fp_ollama)
otel-collector | -> llm.token.counts: Str({"prompt_tokens": 24, "completion_tokens": 12, "total_tokens": 36})
otel-collector | {"kind": "exporter", "data_type": "traces", "name": "debug"}
Thanks @codefromthecrypt , really appreciate your time running evaluation on agiflow-sdk. It's definitely our goal to keep the telemetry adhere to standard. Thanks for pointing us to the right direction, will get the next few release aligned with the semantic release.
Hi @codefromthecrypt , I've created a PR that fixes the incompatibility with GenAI semconv. Would you mind giving this branch a quick test or let me know how to run the check to make it easier?
Also I'm a bit confused which identifier should be given to gen_ai.system
, is it bounded to vendor name or library name? And should it be added to API span only?
sorry about missing this
for the span, this is the logical span representing say an openai call. Ack that there is an http call underneath the openai library abstraction. Right now, I didn't notice any subspans. So, basically the span representing the library call. If recently you also add http child span, that's cool, just the spec is about the application layer one.
for gen_ai.system, the docs currently have this (I'll add ollama at some point soon)
gen_ai.system
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description |
---|---|
anthropic | Anthropic |
cohere | Cohere |
openai | OpenAI |
vertex_ai | Vertex AI |
-- |
For testing I've been using the code above pasted into the description. As I'm not sure how to add a pip dep on a branch, you could either run the code and paste collector output, or tell me how to use your branch. I use this pipfile
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
agiflow-sdk = "*"
openai = "*"
[dev-packages]
[requires]
python_version = "3.12"
Thanks @codefromthecrypt , that sounds great to me.
Automatic http tracing is currently not support as we're having lots of empty trace on Azure function. Currently we support automatic traces only on LLM libraries; customers can add http traces via extra_instrumentations
arguments when initialize the library.
I've added sample app to app/agiflow-sdk-samples
with the script you provided. Hope that help!
Also notice from the working group messages, I think for now will leave prompt/completion captured on gen_ai.promp
and gen_ai.completion
and will add events
support in another release.
thanks I'll follow-up more here, but you may want to look at open-telemetry/semantic-conventions#1315 (comment) I don't remember if you had already. Possibly you can comment your experience on this topic even if it doesn't end up being about internal vs client span kind
Thanks and personally done here until something else. Your PR is very close.
made a comment in that PR, noting your exception on the span events -> attributes part. this isn't different than openllmetry who also don't follow span events at the moment, except choices of how to represent attributes
This mainly would impact backend portability as when key data is done differently, it is hard for folks to make portable visualization or analysis tools. Event api could land soon, but it also could be a very long way away, and even longer for all backends to use it. So, basically this lack of portability will last at least that long.
I would expect that knowing this regardless of what the spec says, even when (log) event api exists, instrumentation might have a config toggle to use span events. I would bet $5 but not the house ;)
Anyway, between now and then, those really wanting to normalize on this point could rewrite the data in a custom exporter or in the collector, maybe transformprocessor, knowing the data layout basically.
Yes, I've updated to support span events by now. Checked our backend code, should be simple to support span events.
I will checkout the github issue between INTERNAL and CLIENT span soon, thanks for sharing that.
@codefromthecrypt , thanks again for your help! agiflow-sdk
v0.0.24 is released with gen_ai semconv fixes.