strands-agents/sdk-python

[FEATURE] Re-try limit with structured_output

Opened this issue · 6 comments

Problem Statement

Structured_output requests [seem] to be able to re-try forever when a validation error is thrown.

NOTE: There's already an open PR to already address this: #1026
Edit: Whoops, now closed and apparently for the structured_output implementation prior to v1.14.0

But opening an issue in hopes of prioritization / discussion as I'd rather keep non-code discussions outside of PRs.

Proposed Solution

A max-retries limit is the preferred solution.

Use Case

Prevent an infinite loop of LLM re-try attempts for prompts that are at risk of having LLMs not be able to produce the expected output.

Alternatives Solutions

No response

Additional Context

First behavior I tried with the new structured_output implementation was to update a test pydantic model with:

class TestModel(BaseModel):
  # [...]
  give_me_a_string: int

...Just corner-case testing and was not surprised it was not able to handle this. But would prefer confidence that this will not happen in production.

Stream:

  ERROR:strands.tools.structured_output.structured_output_tool:tool_name=<TestModel> | structured output validation failed | error_message=<Validation failed for TestModel. Please fix the following errors:
- Field 'give_me_a_string': Field required>
<thinking> It there was an error due to a missing field in the tool call. I will correct the error and resubmit the analysis without the unnecessary field. </thinking> 
Tool #2: TestModel
ERROR:strands.tools.structured_output.structured_output_tool:tool_name=<TestModel> | structured output validation failed | error_message=<Validation failed for TestModel. Please fix the following errors:
- Field 'give_me_a_string': Field required>

It went on to re-try 40 times until being rate limited by AWS bedrock.
This is concerning and we'd love to see the max_retries or a more defensive way to handle this (still testing, so maybe it's possible).

Hi, in 1.14.0 we introduced a new mechanism for structured_output which we expect to be much more robust. This deprecates the existing

agent.structured_output

pattern in favor of

result = agent(structured_output_model=SomeModel)
some_model: SomeModel = result.structured_output

You can read more about this here https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/structured-output/#basic-usage

Please let us know if you're having success with this

Hi @dbschmigelski, I am having success with it and appreciate the new implementation!

This was using the new implementation with...

DEFAULT_MODEL_ID = "us.amazon.nova-lite-v1:0"

def strands_analysis(prompt: str, model_id: str = DEFAULT_MODEL_ID):
    agent = Agent(model=model_id, system_prompt=SYS_PROMPT)
    result = agent(_genUserPrompt(prompt), structured_output_model=TestModel)

    output = result.structured_output

    usage = {
        "inputTokens": result.metrics.accumulated_usage["inputTokens"],
        "outputTokens": result.metrics.accumulated_usage["outputTokens"],
    }
    logger.info("LLM result and usage", extra={"usage": usage, "result": output})

    if not isinstance(output, TestModel):
        raise ValueError("Structured output is not of expected type TestModel")

    return output

I have noticed it to be more reliable than the old implementation; appreciate the default-retry behavior, and greater ease of accessing token usage.

But am not sure if there's a recommended way to limit re-tries if it continually cannot produce a valid structured output.

All glory to @afarntrog

Regarding,

But am not sure if there's a recommended way to limit re-tries if it continually cannot produce a valid structured output.

I suppose this is something that has now been shifted on to the user. I think there are 3 options

  1. In your system prompt you can describe limiting the usage
  2. You can use a hook, such that if the StructuredOutputTool is used too many times in a row with failures you can block it
  3. Perhaps this is a follow up feature where we implement 2 within the SDK but expose some "max_structured_output_retries" flag

Perhaps this is a follow up feature where we implement 2 within the SDK but expose some "max_structured_output_retries" flag

I think this is important enough to be supported natively by the SDK. We just had a retry loop run for 5 hours with an important cost spike.

Note: Thanks to the team for the improved structured outputs, we benchmarked its token efficiency against litellm and it uses about ~40% less tokens on our use case which is fantastic.

We just had a retry loop run for 5 hours with an important cost spike.

Sorry about this and thanks for highlighting it

We'll want to address this specifically; in the meantime, we do have an example hook for limiting tool calls: https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/hooks/#limit-tool-counts