[FEATURE] Support a fixed, injected `AsyncOpenAI` client to enable alternative interface-compatible clients
Opened this issue · 3 comments
Problem Statement
Many users need to supply an alternative interface-compatible implementation of openai.client (e.g., a GuardrailsAsyncOpenAI wrapper to implement OpenAI Guardrails). Today the SDK creates a new AsyncOpenAI client per request to avoid HTTPX connection sharing across event loops. This makes it impossible to:
- Inject a pre-configured, guardrails-enabled client.
- Reuse connection pools efficiently within a single event loop/worker.
- Centralise observability, retries, timeouts and networking policy on one client.
Separately, Strands currently has very limited support for guardrails outside of Bedrock Guardrails, so users commonly reach for OpenAI-side guardrails via a wrapper client. Without support for injecting a fixed client, this is impossible
Proposed Solution
Allow OpenAIModel to accept a fixed, injected AsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown. Continue to support current behaviour when no client is provided (backwards compatible).
Key changes (additive, non-breaking):
-
Constructor injection
OpenAIModel(client: Optional[Client] = None, client_args: Optional[dict] = None, …)- If
clientis provided, reuse it and do not create/close a new client internally. - If
clientisNone, retain current behaviour (construct ephemeral client).
-
Lifecycle guidance in docs
- Recommend creating one client per worker/event loop (e.g., FastAPI lifespan
startup/shutdown). - Emphasise that clients should not be shared across event loops, but can be safely reused across tasks within a loop.
- Recommend creating one client per worker/event loop (e.g., FastAPI lifespan
-
Acceptance criteria
- Works with any
AsyncOpenAI-compatible interface (e.g.,GuardrailsAsyncOpenAI, custom proxies, instrumentation wrappers). - Streaming and structured output paths both reuse the injected client.
- Clear examples for FastAPI and generic asyncio.
- Works with any
Code sketch (constructor + reuse):
from typing import Any, Optional, Protocol
from openai import AsyncOpenAI
class Client(Protocol):
@property
def chat(self) -> Any: ...
class OpenAIModel(Model):
def __init__(self, client: Optional[Client] = None, client_args: Optional[dict] = None, **config):
self.client = client
self._owns_client = client is None
self.client_args = client_args or {}
self.config = dict(config)
async def stream(...):
request = self.format_request(...)
if self.client is not None:
# Reuse injected client
response = await self.client.chat.completions.create(**request)
...
else:
# Back-compat
async with AsyncOpenAI(**self.client_args) as c:
response = await c.chat.completions.create(**request)
...Example usage (FastAPI lifespan, per-worker client):
from fastapi import FastAPI
from openai import AsyncOpenAI
# from my_guardrails import GuardrailsAsyncOpenAI
app = FastAPI()
@app.on_event("startup")
async def startup():
base = AsyncOpenAI()
app.state.oai = base # or GuardrailsAsyncOpenAI(base)
app.state.model = OpenAIModel(client=app.state.oai, model_id="gpt-4o")
@app.on_event("shutdown")
async def shutdown():
await app.state.oai.close()Use Case
- Guardrails: Wrap the OpenAI client with
GuardrailsAsyncOpenAIto enforce content filters, schema validation, and redaction before responses reach application code. - Observability & policy: Centralise timeouts, retries, logging, tracing, and network egress policy (e.g., custom
httpx.AsyncClient). - Performance: Reuse keep-alive connections and connection pools within a worker/event loop for lower latency and higher throughput.
- Multi-model routing: Swap the injected client to target proxies or gateways without touching model code (e.g., toggling
base_url, auth, or headers).
This would help with:
- Meeting compliance requirements where guardrails must run before responses are consumed.
- Reducing tail latency by avoiding per-request client construction.
- Simplifying integration with enterprise networking and telemetry.
Alternatives Solutions
-
Create a new client per request
- Pros: Safe wrt event-loop boundaries; current behaviour.
- Cons: Loses pooling; higher latency and allocation overhead; hard to apply cross-cutting concerns (guardrails, tracing) consistently.
-
Global client shared across event loops
- Pros: Simple in theory.
- Cons: Unsafe; HTTPX pools cannot be shared across loops; leads to intermittent runtime errors.
-
Disable pooling (force
Connection: close)- Pros: Avoids cross-loop sharing issues.
- Cons: Sacrifices performance; still doesn’t enable easy injection of guardrails wrappers.
Additional Context
- Rationale: HTTPX connection pools are not shareable across asyncio event loops; reuse is safe within a loop.
- Need: Strands’ current guardrails support focuses on Bedrock; many users need OpenAI-side guardrails today.
- The OpenAI Python SDK supports async reuse and custom HTTP clients (
http_client=), making injection straightforward.
If useful, I’m happy to contribute a PR with the constructor change, a small _stream_with_client helper, tests, and docs.
My general thoughts:
compatible client, created once per worker/event loop
@pgrayy - any reason against creating a client per agent.invocation? Curious if we know of any downsides to that approach. I'm generally inclined otherwise though there'd be a bit of plumbing to do it
Allow OpenAIModel to accept a fixed, injected AsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown
These seem at odds with each other - "created once per worker/event loop" and "at application startup and closed at shutdown" are different points. The latter is specifically what we're working around by creating a client per-request
Requirements-wise:
- Inject a pre-configured, guardrails-enabled client.
- Centralise observability, retries, timeouts and networking policy on one client.
I think both of these would be satisfied with the idea of a client_factory which is invoked when a new client is needed
- Reuse connection pools efficiently within a single event loop/worker.
Where-as I am unsure about this one - a client_factory might satisfy this one - I am unsure
This actually came up for discussion in #1036. We also brought up the idea of a client factory. In the meanwhile, for the issue noted in this other ticket, we suggested deriving a custom httpx.AsyncClient that is passed through in client_args under the http_client field (not sure if it also applies here fully).
In short, I would also like to explore this idea of a client factory. It seems though here the suggestion is to pass in an OpenAI client rather than a lower level httpx client. Would have to give this some thought.
This actually came up for discussion in #1036. We also brought up the idea of a client factory. In the meanwhile, for the issue noted in this other ticket, we suggested deriving a custom
httpx.AsyncClientthat is passed through inclient_argsunder thehttp_clientfield (not sure if it also applies here fully).In short, I would also like to explore this idea of a client factory. It seems though here the suggestion is to pass in an OpenAI client rather than a lower level httpx client. Would have to give this some thought.
Hey @pgrayy,
Thanks for your response, providing a llm client factory would be preferable as this would give more flexibility to implement things like observability and guardrails at the client level. This would allow the agent level code to remain somewhat client/llm agnostic.
More specifically this makes it possible to implement the OpenAI Guardrails package which is implemented at client level.
https://github.com/openai/openai-guardrails-python