[FEATURE] Support a fixed, injected `AsyncOpenAI` client to enable alternative interface-compatible clients

Question

[FEATURE] Support a fixed, injected `AsyncOpenAI` client to enable alternative interface-compatible clients

Opened this issue 11 days ago · 3 comments

Problem Statement

Many users need to supply an alternative interface-compatible implementation of openai.client (e.g., a GuardrailsAsyncOpenAI wrapper to implement OpenAI Guardrails). Today the SDK creates a new AsyncOpenAI client per request to avoid HTTPX connection sharing across event loops. This makes it impossible to:

Inject a pre-configured, guardrails-enabled client.
Reuse connection pools efficiently within a single event loop/worker.
Centralise observability, retries, timeouts and networking policy on one client.

Separately, Strands currently has very limited support for guardrails outside of Bedrock Guardrails, so users commonly reach for OpenAI-side guardrails via a wrapper client. Without support for injecting a fixed client, this is impossible

Proposed Solution

Allow OpenAIModel to accept a fixed, injected AsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown. Continue to support current behaviour when no client is provided (backwards compatible).

Key changes (additive, non-breaking):

Constructor injection
- OpenAIModel(client: Optional[Client] = None, client_args: Optional[dict] = None, …)
- If client is provided, reuse it and do not create/close a new client internally.
- If client is None, retain current behaviour (construct ephemeral client).
Lifecycle guidance in docs
- Recommend creating one client per worker/event loop (e.g., FastAPI lifespan startup/shutdown).
- Emphasise that clients should not be shared across event loops, but can be safely reused across tasks within a loop.
Acceptance criteria
- Works with any AsyncOpenAI-compatible interface (e.g., GuardrailsAsyncOpenAI, custom proxies, instrumentation wrappers).
- Streaming and structured output paths both reuse the injected client.
- Clear examples for FastAPI and generic asyncio.

Code sketch (constructor + reuse):

from typing import Any, Optional, Protocol
from openai import AsyncOpenAI

class Client(Protocol):
    @property
    def chat(self) -> Any: ...

class OpenAIModel(Model):
    def __init__(self, client: Optional[Client] = None, client_args: Optional[dict] = None, **config):
        self.client = client
        self._owns_client = client is None
        self.client_args = client_args or {}
        self.config = dict(config)

    async def stream(...):
        request = self.format_request(...)
        if self.client is not None:
            # Reuse injected client
            response = await self.client.chat.completions.create(**request)
            ...
        else:
            # Back-compat
            async with AsyncOpenAI(**self.client_args) as c:
                response = await c.chat.completions.create(**request)
                ...

Example usage (FastAPI lifespan, per-worker client):

from fastapi import FastAPI
from openai import AsyncOpenAI
# from my_guardrails import GuardrailsAsyncOpenAI

app = FastAPI()

@app.on_event("startup")
async def startup():
    base = AsyncOpenAI()
    app.state.oai = base  # or GuardrailsAsyncOpenAI(base)
    app.state.model = OpenAIModel(client=app.state.oai, model_id="gpt-4o")

@app.on_event("shutdown")
async def shutdown():
    await app.state.oai.close()

Use Case

Guardrails: Wrap the OpenAI client with GuardrailsAsyncOpenAI to enforce content filters, schema validation, and redaction before responses reach application code.
Observability & policy: Centralise timeouts, retries, logging, tracing, and network egress policy (e.g., custom httpx.AsyncClient).
Performance: Reuse keep-alive connections and connection pools within a worker/event loop for lower latency and higher throughput.
Multi-model routing: Swap the injected client to target proxies or gateways without touching model code (e.g., toggling base_url, auth, or headers).

This would help with:

Meeting compliance requirements where guardrails must run before responses are consumed.
Reducing tail latency by avoiding per-request client construction.
Simplifying integration with enterprise networking and telemetry.

Alternatives Solutions

Create a new client per request
- Pros: Safe wrt event-loop boundaries; current behaviour.
- Cons: Loses pooling; higher latency and allocation overhead; hard to apply cross-cutting concerns (guardrails, tracing) consistently.
Global client shared across event loops
- Pros: Simple in theory.
- Cons: Unsafe; HTTPX pools cannot be shared across loops; leads to intermittent runtime errors.
Disable pooling (force Connection: close)
- Pros: Avoids cross-loop sharing issues.
- Cons: Sacrifices performance; still doesn’t enable easy injection of guardrails wrappers.

Additional Context

Rationale: HTTPX connection pools are not shareable across asyncio event loops; reuse is safe within a loop.
Need: Strands’ current guardrails support focuses on Bedrock; many users need OpenAI-side guardrails today.
The OpenAI Python SDK supports async reuse and custom HTTP clients (http_client=), making injection straightforward.

If useful, I’m happy to contribute a PR with the constructor change, a small _stream_with_client helper, tests, and docs.

Answer 1 · 2025-10-29T13:24:14.000Z

My general thoughts:

compatible client, created once per worker/event loop

@pgrayy - any reason against creating a client per agent.invocation? Curious if we know of any downsides to that approach. I'm generally inclined otherwise though there'd be a bit of plumbing to do it

Allow OpenAIModel to accept a fixed, injected AsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown

These seem at odds with each other - "created once per worker/event loop" and "at application startup and closed at shutdown" are different points. The latter is specifically what we're working around by creating a client per-request

Requirements-wise:

Inject a pre-configured, guardrails-enabled client.

Centralise observability, retries, timeouts and networking policy on one client.

I think both of these would be satisfied with the idea of a client_factory which is invoked when a new client is needed

Reuse connection pools efficiently within a single event loop/worker.

Where-as I am unsure about this one - a client_factory might satisfy this one - I am unsure

Answer 2 · 2025-10-29T13:32:01.000Z

This actually came up for discussion in #1036. We also brought up the idea of a client factory. In the meanwhile, for the issue noted in this other ticket, we suggested deriving a custom httpx.AsyncClient that is passed through in client_args under the http_client field (not sure if it also applies here fully).

In short, I would also like to explore this idea of a client factory. It seems though here the suggestion is to pass in an OpenAI client rather than a lower level httpx client. Would have to give this some thought.

Answer 3 · 2025-10-29T22:26:00.000Z

This actually came up for discussion in #1036. We also brought up the idea of a client factory. In the meanwhile, for the issue noted in this other ticket, we suggested deriving a custom httpx.AsyncClient that is passed through in client_args under the http_client field (not sure if it also applies here fully).

In short, I would also like to explore this idea of a client factory. It seems though here the suggestion is to pass in an OpenAI client rather than a lower level httpx client. Would have to give this some thought.

Hey @pgrayy,

Thanks for your response, providing a llm client factory would be preferable as this would give more flexibility to implement things like observability and guardrails at the client level. This would allow the agent level code to remain somewhat client/llm agnostic.

More specifically this makes it possible to implement the OpenAI Guardrails package which is implemented at client level.
https://github.com/openai/openai-guardrails-python