microsoft/Windows-classic-samples

Delayed captured audio in Application Loopback sample

naguileraleal opened this issue · 1 comments

Hello!

I'm experiencing a noticeable delay when playing back the captured audio by the Application Loopback sample. I noticed this when I played the captured audio over the original process' audio, through the same audio endpoint, at the same time.

I would like to know if the delay I'm experiencing is caused by the loopback capture (meaning there's a significative delay between the original process publishing an audio packet and the capture interface receiving it), and if there's a way to lower it to an unnoticable ammount.

I forked and modified the original Application Loopback sample to be able to play back the captured audio. The code for this modified sample is here: https://github.com/naguileraleal/Windows-classic-samples/tree/main/applicationloopbackaudio

I'll now present my modifications to the original sample and the tests I did to determine what's causing this issue.

Modifications to Application Loopback

Playing back captured audio

In order to determine a delay exists between a process' audio and the captured audio, the latter can be played back over the first one.
To achieve this, I initialized a new Audio Client against the same Audio Endpoint the captured process is using.
https://github.com/naguileraleal/Windows-classic-samples/blob/4310b5ddb06465f2c1a4d6dd004bc8262b4f8033/applicationloopbackaudio/cpp/ApplicationLoopback.cpp#L372C19-L372C19
Then, in the CLoopbackCapture::OnAudioSampleRequested() method, after calling IAudioCaptureClient::GetBuffer, I added calls to IAudioRenderClient::GetBuffer and IAudioRenderClient::ReleaseBuffer. See https://github.com/naguileraleal/Windows-classic-samples/blob/4310b5ddb06465f2c1a4d6dd004bc8262b4f8033/applicationloopbackaudio/cpp/LoopbackCapture.cpp#L460
This plays back each captured packet through the output endpoint, right after being captured.
This callback is called every 10ms. The whole CLoopbackCapture::OnAudioSampleRequested() execution takes 1.5ms at worst, and 1ms on average. This is including the resampling step I'll mention next.

Resampling captured samples

Because the captured sample format is not always compatible with the output Audio Client supported formats, a resampling step is needed between the captured samples and the output samples. This was implemented using Media Foundation. The resampling is performed sequentially, after capturing a packet and before pushing it to the output client's buffer, in CLoopbackCapture::OnAudioSampleRequested() .
See https://github.com/naguileraleal/Windows-classic-samples/blob/4310b5ddb06465f2c1a4d6dd004bc8262b4f8033/applicationloopbackaudio/cpp/LoopbackCaptureBase.cpp#L70 for the implementation of the resampling function.

Size of the IAudioClient buffer

Originally, the buffer of the IAudioClient that makes the capture (aka the capture client ) was 2 seconds long, as stated its initialization in the original sample. I changed this value to zero, since the documentation for IAudioClient::Initialize states that this method ensures the audio buffer is big enough to meet the audio engine's requirements.
When I call IAudioClient::GetBufferSize on this client, it returns 0. Why?
I encountered some undocumented behaviour while calling some methods of the capture client. Calling IAudioClient::GetStreamLatency or IAudioClient::GetDevicePeriod returns "not implemented".

I also did this for the buffer of the IAudioClient that outputs the captured audio to the audio engine (aka the output client). In this case, after initializing it, the call to IAudioClient::GetBufferSize returns a buffer size of 1056 audio frames. GetStreamLatency and GetDevicePeriod return valid values.

Checking the production timestamp of the audio frames

In CLoopbackCapture::OnAudioSampleRequested(), when calling IAudioClient::GetBuffer on the capture client, as I understand from the documentation, the pu64DevicePosition parameter should return the location of the audio packet relative to the beginning of the stream. In my tests, this value is always 0. Why?
On the other hand, the pu64QPCPosition parameter returns a valid value. Comparing the values between successive calls to CLoopbackCapture::OnAudioSampleRequested() shows that there's a ~10ms difference between each packet. Even when I hear a noticable delay between the original and the captured audio!
Meanwhile, the IAudioClient::GetCurrentPadding calls for the output and the capture client both return 0, meaning that the capture client hasn't got any packets to give me, and the output client hasn't got any packets to send to the audio engine. If both of these things are happening, shouldn't I be hearing the latest captured audio? Does this mean that the capture client is slow?

Furthermore, I can deliberately cause the delay to increase by pausing the terminal on which the sample is running. When I pause it (by clicking on the terminal) the captured audio playback stops. When I hit Enter, the captured audio playback resumes, this time with a greater delay. But there's more: The delay is not equal to the time the process has been paused. It varies, and it seems that there's a maximum amount of delay possible. Sometimes, when a delay exists, pausing and resuming makes the delay decrease.
I believe this has to do with the buffer sizes, but I cannot explain this behaviour fully.

Implementing a "Synchronous" Loopback Capture

The LoopbackCaptureSync class implemented in my sample code does the same thing as the CLoopbackCapture class, but it does so without using Media Foundation's work queues, and without waiting for "Sample Ready" events. I was trying to simplify the capture process as much as I could to see the cause of the delay more clearly. Sadly, it did not change a thing.


Any help is much appreciated!

Using Audacity to reproduce the effect I'm hearing, it seems the captured audio is played back with a 30-50ms delay with respect to the original audio