disler/poc-realtime-ai-assistant

WebSocket Connection Resilience

Opened this issue · 1 comments

WebSocket Connection Resilience

Description

The WebSocket connection between the client and server is experiencing intermittent closures and errors, leading to interruptions in communication. This affects the reliability of real-time interactions with the AI assistant.

Steps to Reproduce

  1. Run the realtime assistant using uv run main.
  2. Initiate a session and interact with the assistant.
  3. Observe the connection status in the logs.
  4. Notice warnings like ⚠️ WebSocket connection lost. Reconnecting... and errors such as ConnectionClosedError: no close frame received or sent.

Expected Behavior

The WebSocket connection should maintain stability throughout the session, automatically handling any transient disconnections without significant interruptions.

Actual Behavior

The connection is occasionally lost, leading to warnings and errors that disrupt the communication flow between the client and server.

Potential Solutions

  • Implement Robust Reconnection Strategy: Utilize exponential backoff for reconnection attempts to avoid overwhelming the server.
  • Enhanced Error Handling: Differentiate between various connection errors and handle each appropriately.
  • Keep-Alive Mechanism: Ensure proper implementation of keep-alive pings to maintain the connection.
  • Logging Improvements: Add more detailed logging to identify the root causes of connection drops.

I found a problem in the function.Add this code in main.py on the input_audio_buffer.speech_stopped

`
elif event["type"] == "input_audio_buffer.speech_stopped":
mic.stop_recording()
logger.info("Speech ended, processing...")

                            # Validate buffer content before committing
                            audio_data = mic.get_audio_data()
                            if audio_data and len(audio_data) > 0:
                                base64_audio = base64_encode_audio(audio_data)
                                if base64_audio:
                                    audio_event = {
                                        "type": "input_audio_buffer.append",
                                        "audio": base64_audio,
                                    }
                                    log_ws_event("Outgoing", audio_event)
                                    await websocket.send(json.dumps(audio_event))
                                else:
                                    logger.debug("No audio data to send")
                            else:
                                logger.warning("Audio buffer is empty, retrying capture...")
                                await asyncio.sleep(0.5)  # Wait briefly before retrying
                                audio_data = mic.get_audio_data()
                                if audio_data:
                                    logger.info("Successfully captured audio data on retry.")
                                    base64_audio = base64_encode_audio(audio_data)
                                    audio_event = {
                                        "type": "input_audio_buffer.append",
                                        "audio": base64_audio,
                                    }
                                    log_ws_event("Outgoing", audio_event)
                                    await websocket.send(json.dumps(audio_event))
                                else:
                                    logger.error("Failed to capture audio data after retry.")

                            # Start the response timer, on send
                            response_start_time = time.perf_counter()
                            await websocket.send(json.dumps({"type": "input_audio_buffer.commit"}))

`