[FEATURE] Re-try limit with structured_output
Opened this issue · 6 comments
Problem Statement
Structured_output requests [seem] to be able to re-try forever when a validation error is thrown.
NOTE: There's already an open PR to already address this: #1026
Edit: Whoops, now closed and apparently for the structured_output implementation prior to v1.14.0
But opening an issue in hopes of prioritization / discussion as I'd rather keep non-code discussions outside of PRs.
Proposed Solution
A max-retries limit is the preferred solution.
Use Case
Prevent an infinite loop of LLM re-try attempts for prompts that are at risk of having LLMs not be able to produce the expected output.
Alternatives Solutions
No response
Additional Context
First behavior I tried with the new structured_output implementation was to update a test pydantic model with:
class TestModel(BaseModel):
# [...]
give_me_a_string: int...Just corner-case testing and was not surprised it was not able to handle this. But would prefer confidence that this will not happen in production.
Stream:
ERROR:strands.tools.structured_output.structured_output_tool:tool_name=<TestModel> | structured output validation failed | error_message=<Validation failed for TestModel. Please fix the following errors:
- Field 'give_me_a_string': Field required>
<thinking> It there was an error due to a missing field in the tool call. I will correct the error and resubmit the analysis without the unnecessary field. </thinking>
Tool #2: TestModel
ERROR:strands.tools.structured_output.structured_output_tool:tool_name=<TestModel> | structured output validation failed | error_message=<Validation failed for TestModel. Please fix the following errors:
- Field 'give_me_a_string': Field required>
It went on to re-try 40 times until being rate limited by AWS bedrock.
This is concerning and we'd love to see the max_retries or a more defensive way to handle this (still testing, so maybe it's possible).
Hi, in 1.14.0 we introduced a new mechanism for structured_output which we expect to be much more robust. This deprecates the existing
agent.structured_output
pattern in favor of
result = agent(structured_output_model=SomeModel)
some_model: SomeModel = result.structured_output
You can read more about this here https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/structured-output/#basic-usage
Please let us know if you're having success with this
Hi @dbschmigelski, I am having success with it and appreciate the new implementation!
This was using the new implementation with...
DEFAULT_MODEL_ID = "us.amazon.nova-lite-v1:0"
def strands_analysis(prompt: str, model_id: str = DEFAULT_MODEL_ID):
agent = Agent(model=model_id, system_prompt=SYS_PROMPT)
result = agent(_genUserPrompt(prompt), structured_output_model=TestModel)
output = result.structured_output
usage = {
"inputTokens": result.metrics.accumulated_usage["inputTokens"],
"outputTokens": result.metrics.accumulated_usage["outputTokens"],
}
logger.info("LLM result and usage", extra={"usage": usage, "result": output})
if not isinstance(output, TestModel):
raise ValueError("Structured output is not of expected type TestModel")
return outputI have noticed it to be more reliable than the old implementation; appreciate the default-retry behavior, and greater ease of accessing token usage.
But am not sure if there's a recommended way to limit re-tries if it continually cannot produce a valid structured output.
All glory to @afarntrog
Regarding,
But am not sure if there's a recommended way to limit re-tries if it continually cannot produce a valid structured output.
I suppose this is something that has now been shifted on to the user. I think there are 3 options
- In your system prompt you can describe limiting the usage
- You can use a hook, such that if the StructuredOutputTool is used too many times in a row with failures you can block it
- Perhaps this is a follow up feature where we implement 2 within the SDK but expose some "max_structured_output_retries" flag
Perhaps this is a follow up feature where we implement 2 within the SDK but expose some "max_structured_output_retries" flag
I think this is important enough to be supported natively by the SDK. We just had a retry loop run for 5 hours with an important cost spike.
Note: Thanks to the team for the improved structured outputs, we benchmarked its token efficiency against litellm and it uses about ~40% less tokens on our use case which is fantastic.
We just had a retry loop run for 5 hours with an important cost spike.
Sorry about this and thanks for highlighting it
We'll want to address this specifically; in the meantime, we do have an example hook for limiting tool calls: https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/hooks/#limit-tool-counts