openai/openai-realtime-api-beta

With server_vad turn detection, appendInputAudio perf becomes unusable

Opened this issue · 0 comments

The buffer never stops growing since it's never reset (createResponse() isn't called in server_vad mode). As a result, appendInputAudio's performance degrades drastically as the buffer copy gets bigger and bigger.

This fix seems to work.

appendInputAudio(arrayBuffer) {
if (arrayBuffer.byteLength > 0) {
this.realtime.send('input_audio_buffer.append', {
audio: RealtimeUtils.arrayBufferToBase64(arrayBuffer),
});

/////////// ADD THIS CODE
if (this.sessionConfig.turn_detection?.type !== 'server_vad') {
this.inputAudioBuffer = RealtimeUtils.mergeInt16Arrays(
this.inputAudioBuffer,
arrayBuffer,
);
}
/////////// END ADDED CODE
}
return true;
}