Anthropic: Usage metadata is inaccurate for prompt cache reads/writes

Question

Anthropic: Usage metadata is inaccurate for prompt cache reads/writes

Closed this issue 2 months ago · 2 comments

Checked other resources

This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Example Code

import httpx
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import SystemMessage

ADVENTURES_OF_SHERLOCK_HOLMES = httpx.get(
    "https://www.gutenberg.org/ebooks/1661.txt.utf-8", follow_redirects=True
).text

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Summarize the adventures of Sherlock Holmes.",
                "cache_control": {"type": "ephemeral", "ttl": "5m"},
            }
        ],
    },
]

system = SystemMessage(
    content=[
        {
            "type": "text",
            "text": "You are a helpful assistant that is a literary expert. Here's the text of the adventures of Sherlock Holmes.",
        },
        {
            "type": "text",
            "text": ADVENTURES_OF_SHERLOCK_HOLMES,
            "cache_control": {"type": "ephemeral", "ttl": "5m"},
        },
    ]
)


llm = ChatAnthropic(model_name="claude-sonnet-4-20250514")
final_chunk = None

for chunk in llm.stream(input=[system, *messages]):
    final_chunk = chunk if final_chunk is None else final_chunk + chunk

print(final_chunk.usage_metadata)

ai_message = llm.invoke(input=[system, *messages])
print(ai_message.usage_metadata)

Error Message and Stack Trace (if applicable)

No response

Description

Making a simple request using prompt caching for both the system and messages array to Anthropic. With a warmed cached (e.g. run the script once before):

# streaming `usage_metadata` is
{'input_tokens': 151998, 'output_tokens': 691, 'total_tokens': 152689, 'input_token_details': {'cache_creation': 0, 'cache_read': 151995}}

# invoke() `usage_metadata` is
{'input_tokens': 151998, 'output_tokens': 614, 'total_tokens': 152612, 'input_token_details': {'cache_read': 151995, 'cache_creation': 0, 'ephemeral_5m_input_tokens': 0, 'ephemeral_1h_input_tokens': 0}}

However actual server response (captured via mitmproxy --mode reverse:https://api.anthropic.com -p 8888) reveals the following for stream():

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":3,"cache_creation_input_tokens":0,"cache_read_input_tokens":151995,"out
put_tokens":691}   }

And for invoke():

    "usage": {
        "input_tokens": 3,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 151995,
        "cache_creation": {
            "ephemeral_5m_input_tokens": 0,
            "ephemeral_1h_input_tokens": 0
        },
        "output_tokens": 614,
        "service_tier": "standard"
    }

usage_metadata["input_tokens"] is thus incorrect. To reconstruct the correct value with LangChain, one would have to do:

usage_metadata["input_tokens"] - usage_metadata["input_token_details"]["cache_read"] - usage_metadata["input_token_details"]["cache_creation"]

If this is intended, it needs to be clarified in docstrings and documentation.

System Info

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:51 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8112
> Python Version:  3.13.2 (main, Feb 12 2025, 14:59:08) [Clang 19.1.6 ]

Package Information
-------------------
> langchain_core: 0.3.75
> langchain: 0.3.25
> langchain_community: 0.3.21
> langsmith: 0.3.45
> langchain_anthropic: 0.3.19
> langchain_aws: 0.2.26
> langchain_cohere: 0.4.4
> langchain_google_genai: 2.1.8
> langchain_mistralai: 0.2.10
> langchain_ollama: 0.3.4
> langchain_openai: 0.3.30
> langchain_text_splitters: 0.3.11
> langchain_xai: 0.2.5
> langgraph_sdk: 0.1.70

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> aiohttp<4,>=3.9.1: Installed. No version info available.
> aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
> anthropic<1,>=0.64.0: Installed. No version info available.
> async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
> boto3: 1.38.25
> cohere: 5.15.0
> dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
> filetype: 1.2.0
> google-ai-generativelanguage: 0.6.18
> httpx: 0.28.1
> httpx-sse<1,>=0.3.1: Installed. No version info available.
> httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
> httpx<1,>=0.25.2: Installed. No version info available.
> httpx>=0.25.2: Installed. No version info available.
> jsonpatch<2.0,>=1.33: Installed. No version info available.
> langchain-anthropic;: Installed. No version info available.
> langchain-aws;: Installed. No version info available.
> langchain-azure-ai;: Installed. No version info available.
> langchain-cohere;: Installed. No version info available.
> langchain-community;: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.49: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.51: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.58: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.68: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.70: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.74: Installed. No version info available.
> langchain-core<2.0.0,>=0.3.75: Installed. No version info available.
> langchain-deepseek;: Installed. No version info available.
> langchain-fireworks;: Installed. No version info available.
> langchain-google-genai;: Installed. No version info available.
> langchain-google-vertexai;: Installed. No version info available.
> langchain-groq;: Installed. No version info available.
> langchain-huggingface;: Installed. No version info available.
> langchain-mistralai;: Installed. No version info available.
> langchain-ollama;: Installed. No version info available.
> langchain-openai;: Installed. No version info available.
> langchain-openai<0.4,>=0.3.28: Installed. No version info available.
> langchain-perplexity;: Installed. No version info available.
> langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available.
> langchain-together;: Installed. No version info available.
> langchain-xai;: Installed. No version info available.
> langchain<1.0.0,>=0.3.23: Installed. No version info available.
> langsmith-pyo3: Installed. No version info available.
> langsmith<0.4,>=0.1.125: Installed. No version info available.
> langsmith<0.4,>=0.1.17: Installed. No version info available.
> langsmith>=0.3.45: Installed. No version info available.
> numpy: 1.26.4
> numpy<3,>=1.26.2: Installed. No version info available.
> ollama<1.0.0,>=0.5.1: Installed. No version info available.
> openai-agents: Installed. No version info available.
> openai<2.0.0,>=1.99.9: Installed. No version info available.
> opentelemetry-api: 1.34.1
> opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
> opentelemetry-sdk: 1.34.1
> orjson: 3.10.15
> orjson>=3.10.1: Installed. No version info available.
> packaging: 24.2
> packaging>=23.2: Installed. No version info available.
> pydantic: 2.10.6
> pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
> pydantic<3,>=2: Installed. No version info available.
> pydantic<3.0.0,>=2.7.4: Installed. No version info available.
> pydantic>=2.7.4: Installed. No version info available.
> pytest: 8.3.5
> PyYAML>=5.3: Installed. No version info available.
> requests: 2.32.4
> requests-toolbelt: 1.0.0
> requests<3,>=2: Installed. No version info available.
> rich: 13.9.4
> SQLAlchemy<3,>=1.4: Installed. No version info available.
> tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
> tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
> tiktoken<1,>=0.7: Installed. No version info available.
> tokenizers<1,>=0.15.1: Installed. No version info available.
> types-pyyaml: 6.0.12.20250516
> typing-extensions>=4.7: Installed. No version info available.
> zstandard: 0.23.0

Answer 1 · 2025-09-05T17:46:10.000Z

This is documented:

class UsageMetadata(TypedDict):
    """Usage metadata for a message, such as token counts.

    This is a standard representation of token usage that is consistent across models.

    ...

    """

    input_tokens: int
    """Count of input (or prompt) tokens. Sum of all input token types."""

input_tokens is defined as "Sum of all input token types" which includes cached tokens. input_tokens consistently means "total tokens processed" between models/providers.

Answer 2 · 2025-09-05T18:26:48.000Z

@mdrxy I'd argue the documentation can do a lot more here. E.g. you had to explicitly mention "which includes cached tokens" which isn't in the docstring.

Provider-specific examples would help here as well to show case what they send back and how they are converted to UsageMetadata.

Saw that you opened #32830 which I appreciate and will monitor!