Anthropic: Usage metadata is inaccurate for prompt cache reads/writes
Closed this issue · 2 comments
Checked other resources
- This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- I added a clear and descriptive title that summarizes this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Example Code
import httpx
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import SystemMessage
ADVENTURES_OF_SHERLOCK_HOLMES = httpx.get(
"https://www.gutenberg.org/ebooks/1661.txt.utf-8", follow_redirects=True
).text
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Summarize the adventures of Sherlock Holmes.",
"cache_control": {"type": "ephemeral", "ttl": "5m"},
}
],
},
]
system = SystemMessage(
content=[
{
"type": "text",
"text": "You are a helpful assistant that is a literary expert. Here's the text of the adventures of Sherlock Holmes.",
},
{
"type": "text",
"text": ADVENTURES_OF_SHERLOCK_HOLMES,
"cache_control": {"type": "ephemeral", "ttl": "5m"},
},
]
)
llm = ChatAnthropic(model_name="claude-sonnet-4-20250514")
final_chunk = None
for chunk in llm.stream(input=[system, *messages]):
final_chunk = chunk if final_chunk is None else final_chunk + chunk
print(final_chunk.usage_metadata)
ai_message = llm.invoke(input=[system, *messages])
print(ai_message.usage_metadata)Error Message and Stack Trace (if applicable)
No response
Description
Making a simple request using prompt caching for both the system and messages array to Anthropic. With a warmed cached (e.g. run the script once before):
# streaming `usage_metadata` is
{'input_tokens': 151998, 'output_tokens': 691, 'total_tokens': 152689, 'input_token_details': {'cache_creation': 0, 'cache_read': 151995}}
# invoke() `usage_metadata` is
{'input_tokens': 151998, 'output_tokens': 614, 'total_tokens': 152612, 'input_token_details': {'cache_read': 151995, 'cache_creation': 0, 'ephemeral_5m_input_tokens': 0, 'ephemeral_1h_input_tokens': 0}}However actual server response (captured via mitmproxy --mode reverse:https://api.anthropic.com -p 8888) reveals the following for stream():
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"input_tokens":3,"cache_creation_input_tokens":0,"cache_read_input_tokens":151995,"out
put_tokens":691} }
And for invoke():
"usage": {
"input_tokens": 3,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 151995,
"cache_creation": {
"ephemeral_5m_input_tokens": 0,
"ephemeral_1h_input_tokens": 0
},
"output_tokens": 614,
"service_tier": "standard"
}
usage_metadata["input_tokens"] is thus incorrect. To reconstruct the correct value with LangChain, one would have to do:
usage_metadata["input_tokens"] - usage_metadata["input_token_details"]["cache_read"] - usage_metadata["input_token_details"]["cache_creation"]
If this is intended, it needs to be clarified in docstrings and documentation.
System Info
System Information
------------------
> OS: Darwin
> OS Version: Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:51 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8112
> Python Version: 3.13.2 (main, Feb 12 2025, 14:59:08) [Clang 19.1.6 ]
Package Information
-------------------
> langchain_core: 0.3.75
> langchain: 0.3.25
> langchain_community: 0.3.21
> langsmith: 0.3.45
> langchain_anthropic: 0.3.19
> langchain_aws: 0.2.26
> langchain_cohere: 0.4.4
> langchain_google_genai: 2.1.8
> langchain_mistralai: 0.2.10
> langchain_ollama: 0.3.4
> langchain_openai: 0.3.30
> langchain_text_splitters: 0.3.11
> langchain_xai: 0.2.5
> langgraph_sdk: 0.1.70
Optional packages not installed
-------------------------------
> langserve
Other Dependencies
------------------
> aiohttp<4,>=3.9.1: Installed. No version info available.
> aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
> anthropic<1,>=0.64.0: Installed. No version info available.
> async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
> boto3: 1.38.25
> cohere: 5.15.0
> dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
> filetype: 1.2.0
> google-ai-generativelanguage: 0.6.18
> httpx: 0.28.1
> httpx-sse<1,>=0.3.1: Installed. No version info available.
> httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
> httpx<1,>=0.25.2: Installed. No version info available.
> httpx>=0.25.2: Installed. No version info available.
> jsonpatch<2.0,>=1.33: Installed. No version info available.
> langchain-anthropic;: Installed. No version info available.
> langchain-aws;: Installed. No version info available.
> langchain-azure-ai;: Installed. No version info available.
> langchain-cohere;: Installed. No version info available.
> langchain-community;: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.49: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.51: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.58: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.68: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.70: Installed. No version info available.
> langchain-core<1.0.0,>=0.3.74: Installed. No version info available.
> langchain-core<2.0.0,>=0.3.75: Installed. No version info available.
> langchain-deepseek;: Installed. No version info available.
> langchain-fireworks;: Installed. No version info available.
> langchain-google-genai;: Installed. No version info available.
> langchain-google-vertexai;: Installed. No version info available.
> langchain-groq;: Installed. No version info available.
> langchain-huggingface;: Installed. No version info available.
> langchain-mistralai;: Installed. No version info available.
> langchain-ollama;: Installed. No version info available.
> langchain-openai;: Installed. No version info available.
> langchain-openai<0.4,>=0.3.28: Installed. No version info available.
> langchain-perplexity;: Installed. No version info available.
> langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available.
> langchain-together;: Installed. No version info available.
> langchain-xai;: Installed. No version info available.
> langchain<1.0.0,>=0.3.23: Installed. No version info available.
> langsmith-pyo3: Installed. No version info available.
> langsmith<0.4,>=0.1.125: Installed. No version info available.
> langsmith<0.4,>=0.1.17: Installed. No version info available.
> langsmith>=0.3.45: Installed. No version info available.
> numpy: 1.26.4
> numpy<3,>=1.26.2: Installed. No version info available.
> ollama<1.0.0,>=0.5.1: Installed. No version info available.
> openai-agents: Installed. No version info available.
> openai<2.0.0,>=1.99.9: Installed. No version info available.
> opentelemetry-api: 1.34.1
> opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
> opentelemetry-sdk: 1.34.1
> orjson: 3.10.15
> orjson>=3.10.1: Installed. No version info available.
> packaging: 24.2
> packaging>=23.2: Installed. No version info available.
> pydantic: 2.10.6
> pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
> pydantic<3,>=2: Installed. No version info available.
> pydantic<3.0.0,>=2.7.4: Installed. No version info available.
> pydantic>=2.7.4: Installed. No version info available.
> pytest: 8.3.5
> PyYAML>=5.3: Installed. No version info available.
> requests: 2.32.4
> requests-toolbelt: 1.0.0
> requests<3,>=2: Installed. No version info available.
> rich: 13.9.4
> SQLAlchemy<3,>=1.4: Installed. No version info available.
> tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
> tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
> tiktoken<1,>=0.7: Installed. No version info available.
> tokenizers<1,>=0.15.1: Installed. No version info available.
> types-pyyaml: 6.0.12.20250516
> typing-extensions>=4.7: Installed. No version info available.
> zstandard: 0.23.0
This is documented:
class UsageMetadata(TypedDict):
"""Usage metadata for a message, such as token counts.
This is a standard representation of token usage that is consistent across models.
...
"""
input_tokens: int
"""Count of input (or prompt) tokens. Sum of all input token types."""input_tokens is defined as "Sum of all input token types" which includes cached tokens. input_tokens consistently means "total tokens processed" between models/providers.
@mdrxy I'd argue the documentation can do a lot more here. E.g. you had to explicitly mention "which includes cached tokens" which isn't in the docstring.
Provider-specific examples would help here as well to show case what they send back and how they are converted to UsageMetadata.
Saw that you opened #32830 which I appreciate and will monitor!