LangTrace - Trace Attributes

This repository hosts the JSON schema definitions and the generated model code for both Python and TypeScript. It's designed to streamline the development process across different programming languages, ensuring consistency in data structure and validation logic. The repository includes tools for automatically generating model code from JSON schema definitions, simplifying the task of keeping model implementations synchronized with schema changes.

Repository Structure

/
├── schemas/                      # JSON schema definitions
│   └── openai_span_attributes.json
├── scripts/                      # Shell scripts for model generation
│   └── generate_python.sh
├── generated/                    # Generated model code
│   ├── python/                   # Python models
│   └── typescript/               # TypeScript interfaces
├── package.json
├── requirements.txt
├── README.md
└── .gitignore

Prerequisites

Before you begin, make sure you have the following installed on your system:

Node.js and npm
Python and pip
ts-node for running TypeScript scripts directly (install globally via npm install -g ts-node)
datamodel-code-generator for Python model generation (install via pip install datamodel-code-generator)

Generating Models

Python Models

To generate Python models from a JSON schema, use the generate_python.sh script located in the scripts directory. This script takes the path to a JSON schema file as an argument and generates a Python model in the generated/python directory.

./scripts/generate_python.sh schemas/llm_span_attributes.json

TypeScript Interfaces

To generate TypeScript interfaces from a JSON schema, use the scripts/generate_typescript.sh script located in the scripts directory. This script also takes the path to a JSON schema file as an argument and generates a TypeScript interface in the src/typescript/models directory. t

(cd src/typescript && npm i)
./scripts/generate_typescript.sh schemas/llm_span_attributes.json

OpenTelemetry Semantic Attributes

Service Type	Name	Type/Schema	Description
LLM	llm.prompts	[{role: string, content: string}]	Captures the input messages given to the LLM. It includes the prompt with role "System" and any "user" and "assistant" messages along with the history. Notes: 1. Prompts are standardized for every LLM vendor. 2. The "system" role will always represent the system prompt passed. Ex: The preamble parameter passed to the cohere API is appended to the system prompt and captured within llm.prompts.
LLM	llm.responses	[{role: string, content: string}]	Captures the output messages given by the LLM. Notes: 1. For image generation, content is an object which has, 'url' which is the url of the image and any other properties that gets attached with it based on the LLM vendor. 2. For tool calling, the list includes role, content and additional properties like tool_id depending on the LLM vendor.
LLM	llm.token.counts	llm.token.counts: { input_tokens: number, output_tokens: number, total_tokens: number }	Captures the token counts used with the request including input, output and total tokens. Notes: 1. For streaming mode, some LLM vendors like OpenAI do not have the token counts. So, this metric calculates the token counts for each stream chunk using the tiktoken library. As a result, it may not be accurate. 2. For cohere, this captures the billed units. And also captures the search_units when search capabilities are used.
LLM	llm.api	string	The endpoint being invoked. Ex: /chat/completions
LLM	llm.model	string	The model used for the call. The model is captured from the response and not from the request. Response has the accurate model name. Ex: Passing "gpt-4" in the request can result in "gpt-4-0613" in the response depending on the version of gpt-4 being used. This is more accurate description of the model used for the call.
LLM	llm.temprature	number	The temperature setting used
LLM	llm.top_p	number	Top P setting
LLM	llm.top_k	number	Top K setting Note: 1. For LLMs that support top_n, the argument is captured in this attribute as both top_k and top_n represent the same thing.
LLM	llm.user	string	This is an LLM request parama for identifying the user originating this request. Not to be confused with the user.id attribute passed to the langtrace SDK using with_additional_attributes option.
LLM	llm.system.fingerprint	string	The system fingerprint parameter passed to the API.
LLM	llm.stream	boolean	Whether or not streaming is used
LLM	llm.encoding.formats	[string]	Mainly applies to Embedding models. List of encoding formats used for embedding.
LLM	llm.dimensions	string	The number of dimensions the resulting output embeddings should have
LLM	llm.generation_id	string	Captures the generation_id from a response if any.
LLM	llm.response_id	string	Captures the response_id from a response if any.
LLM	llm.citations	[object]	List of citations from cohere’s response. Serialized as is without any mutation to apply any standardization. Cohere Documentation on Documents and Citations
LLM	llm.documents	[object]	Serialized list of documents passed to the rerank API of cohere. This primarily applies to retrieval models and serialized as is without any mutation to apply any standardization.
LLM	llm.frequency_penalty	string	Frequency penalty if passed
LLM	llm.presence_penalty	string	Presence penalty if passed
LLM	llm.connectors	[object]	Applies mainly for cohere. Serialized directly without mutation.
LLM	llm.tools	[object]	The list of tools or functions available for the LLM to take a decision on. There is no standardization applied for the schema and serialized as is for different LLM vendors.
LLM	llm.tool_results	[object]	For LLM vendors that require tool_results passed as a separate parameter with the request. Ex: Cohere. For OpenAI, tool results are part of the messages parameter and are captured with llm.prompts.
LLM	llm.embedding_inputs	[string]	Captures the input strings provided to the embedding model.
LLM	llm.embedding_dataset_id	string	Applies only for cohere
LLM	llm.embedding_input_type	string	Applies only for cohere
LLM	llm.embedding_job_name	string	Applies only for the embed_job API for cohere.
LLM	llm.retrieval.query	string	Query passed to the retrieval model. Ex: Cohere Rerank
LLM	llm.retrieval.results	[string]	Serialized array of objects returned by a retrieval model that usually includes the score and the index of the documents passed.
VectorDB	server.address	string	Captures the DB server address if found
VectorDB	db.operation	string	Operations of a vectorDB - add, delete, query, peek etc.
VectorDB	db.system	string	Captures the db - chromedb, pinecone etc.
VectorDB	db.namespace	string	Namespace of the database
VectorDB	db.index	string	Index passed to the database if any
VectorDB	db.collection.name	string	Captures the collection name where vectors are stored that the operation is querying.
VectorDB	db.pinecone.top_k	string	Captures the top_k value for KNN search
VectorDB	db.chromadb.embedding_model	string	Captures the embedding model used with chromadb
Framework	http://langchain.task.name/angchain.task.name	string	Short term that indicates what task the framework is performing. The names are framework specific. Currently it could be one of the following: load_pdf, vector_store, split_text, retriever, prompt, runnable, runnablepassthrough, jsonoutputparser, stroutputparser, listoutputparser, xmloutputparser.
Framework	langchain.inputs	string	Serialized inputs to the function call
Framework	langchain.outputs	string	Serialized outputs of the function call
Framework	llamaindex.task.name	string	Short term that indicates what task the framework is performing. Currently it could be one of the following - query, retrieve, extract, aextract, load_data, chat, achat
Framework	llamaindex.inputs	string	Serialized inputs to the function call
Framework	llamaindex.outputs	string	Serialized outputs of the function call
Langtrace	user.feedback.rating	number	This is useful for capturing the feedback provided by the user of the application for an LLM’s response. Ex: a user hitting a thumbs up or down for a chatbot’s response.
Langtrace	user.id	string	This is application specific and can be optionally passed using the with_additional_attributes option from the SDK for tying users to requests. More details: Langtrace Trace User Feedback
Langtrace	langtrace.testId	string	Unique id of the test generated within langtrace for capturing requests to a specific test bucket. Useful for evaluating a set of requests against a specific test. Ex: A test for measuring factual accuracy.
Langtrace	langtrace.service.name	string	Captures the service name - Ex: openai, llamaindex etc.
Langtrace	langtrace.service.type	string	Captures the service type - It can be one of the below 3 - LLM - VectorDB - Framework
Langtrace	langtrace.service.version	string	Version of the library being used: Ex: 3.0.0 represents the 3.0.0 version of openai python library
Langtrace	langtrace.sdk.name	string	Langtrace SDK that is generating this span. Currently its typescript or python.
Langtrace	langtrace.version	string	Langtrace SDK version.

Contributing

Contributions are welcome! If you'd like to add a new schema or improve the existing model generation process, please follow these steps:

Fork the repository.
Create a new branch for your feature or fix.
Make your changes.
Test your changes to ensure the generated models are correct.
Submit a pull request with a clear description of your changes.

License

This project is licensed under the Apache 2.0. See the LICENSE file for more details.

Scale3-Labs/langtrace-trace-attributes