openai/openai-agents-python

Silence Manager in Voice

Closed this issue · 4 comments

Eliminate awkward conversational gaps in voice interactions. While short pauses in text-based chats are harmless, in voice-based experiences they can create tension, confusion, or even cause the user to abandon the call. This feature ensures that every moment of the conversation feels natural, engaging, and reassuring—regardless of unexpected delays.

How It Works

Turn-End Timer: After each speaking turn (bot or user), a timer is activated. If no response is detected within a configurable threshold (e.g.,5 seconds), the Silence Manager automatically triggers an appropriate action.

Context-Aware Prompts:

If the bot is silent: The system plays a short, friendly filler such as “One moment please…” followed by light hold music, indicating that the bot is processing a heavy task or retrieving information.

If the user is silent: The bot offers gentle reassurance and guidance, e.g., “No rush. When you’re ready, just say my name to continue”. Then the bot is on hold waiting to be woken up. (optional soft background music can play to reduce perceived tension) . An alternative is say: “Are you still there?”

Customizable Settings: Developers can configure thresholds, music type, and fallback prompts to match brand personality.

Error Handling: If the silence is caused by a system error, the feature can escalate to a fail-safe action, such as apologizing or offering to reconnect.

Benefits

Prevents conversation breakdowns caused by long silences.
Improves user comfort and trust by providing clear signals during processing delays.
Reduces drop-off rates and enhances overall user experience.
Creates a more human-like, seamless interaction flow.

Question: If you can not implement it in a short period or time, could you give me some guidance about how to do it on the Agent SDK for Python?

Thanks for writing in. We don't have immediate plans to add such a built-in feature, but let me share a few things.

Using voice pipeline could be a quicker solution for your use case. This means your app can control when to pass audio input chunks to the voice pipeline and the voice pipeline can return any audio response based on the agent run results. Also, your code can detect the duration of silence during a convo, so it's also possible to play some music instead.
See https://openai.github.io/openai-agents-python/voice/quickstart/#run-the-pipeline and other documents and examples for more details.

If you're considering Realtime API w/ WebSocket connections, it's feasible to control the timings of sending input audio from a user, but it could be a challenge if playing music etc. for long silence is must-have for your requirements.

I understand, even with the above information, you need to explore solutions on your end but hope this was helpful to you.

Thanks, Kazuhiro. We'll find some time to implement it, even if only partially, because long silences render customer service useless; the user thinks something's broken and starts talking again when it's not their turn.

This issue is stale because it has been open for 7 days with no activity.

This issue was closed because it has been inactive for 3 days since being marked as stale.