Cannot record sound with loopback if silence at start
tez3998 opened this issue · 10 comments
First of all, thank you for the amazing library.
It helps my projects a lot.
Behavior I encountered
I wrote the following program which just records speaker output for 5 seconds with loopback and saves it.
import soundcard as sc
import soundfile as sf
OUTPUT_FILE_NAME = "out.wav" # output file name.
SAMPLE_RATE = 48000 # [Hz]. sampling rate.
RECORD_SEC = 5 # [sec]. recording duration.
with sc.get_microphone(id=str(sc.default_speaker().name), include_loopback=True).recorder(samplerate=SAMPLE_RATE) as mic:
# record audio with loopback from default speaker.
data = mic.record(numframes=SAMPLE_RATE*RECORD_SEC)
sf.write(file=OUTPUT_FILE_NAME, data=data[:, 0], samplerate=SAMPLE_RATE)
This program works if there is sound from speaker at start.
However, this program doesn`t work if silence at start.
The behavior when silence at start is as follows.
- Run the program
- The program finish immediately without recording speaker output for 5 seconds
My environment
- OS: Windows 11 (x64)
- Python`s version: 3.10.6
- SoundCard`s version: 0.4.2
Error
There is no error.
Depending on the sound card, silence is either reported as no-data, or as silence. However, support for this in soundcard has not been published yet, as I didn't have a good test case yet.
Could you try running your code against the current Git master of soundcard? I believe your issue should be fixed on there. And if it is, I will publish it as a new version as soon as you confirm that it's working as intended.
@bastibe
I appreciate your quick response during your busy time.
Result
I cloned the current master of soundcard and ran the code written above on three output devices.
The results are as shown in the following table.
Output device | Was there sound at the start of the code? | Result |
---|---|---|
AMD High Definition Audio Device | No | Ended immediately and recorded silence. |
AMD High Definition Audio Device | Yes | Successfully recorded. |
Realtek(R) Audio | No | Successfully recorded. |
Realtek(R) Audio | Yes | Successfully recorded. |
Pixel Buds A-Series | No | Ended immediately and recorded silence. |
Pixel Buds A-Series | Yes | Successfully recorded. |
And I encountered the following warning at a random timing on all output devices, but the code could works as the above (Timing was random, but warnings tended to occur when output devices were switched before running the code).
C:\Users\user\workspace\clone\bastibe\SoundCard\soundcard\mediafoundation.py:750: SoundcardRuntimeWarning: data discontinuity in recording
warnings.warn("data discontinuity in recording", SoundcardRuntimeWarning)
Oh, the endless vagaries of sound drivers on Windows.
Regrettably, I can't debug this issue on my machine, as my sound card behaves like your Realtek. Could you check how this fails in _record_chunk
for the affected sound cards?
I could imagine that GetNextPacketSize
in _capture_available_frames
returns AUDCLNT_E_DEVICE_INVALIDATED
.
Alternatively, you could try extending the empty-watcher to more than 10ms. I have seen Windows sound cards taking up to 4s to wake up in extreme cases, if that's the problem. Perhaps we need to wait until AUDCLNT_E_SERVICE_NOT_RUNNING
clears?
However, if so, I still don't know how to proceed in soundcard, as the API does not give an indication of how much silence there was. Soundcard operates on the assumption that you can get a fixed number of samples per second. WASAPI just refusing to return anything breaks that assumption. If you have a reasonable idea of how to deal with that, I'm all ears!
@bastibe
I checked a behavior of SoundCard when output device was Pixel Buds A-Series and there was no sound at the start of the code.
The results of testing your opinions
The value returned from GetNextPacketSize in _capture_available_frames()
Unlike your expectation, GetNextPacketSize always returned 0.
Extending the empty-watcher to more than 4s
I extended empty-watcher to 5s and the code ended in about 5s from its start.
Waiting until AUDCLNT_E_SERVICE_NOT_RUNNING clears
I don`t know what to do due to the lack of my knowledge about audio. Sorry for this.
What I noticed
time.sleep() cannot sleep for 1ms
I noticed time.sleep(0.001) actually sleeps for not 1ms but about 5-15ms. This answer in stackoverflow says the smallest interval you can sleep for is about 10-13ms
. If so, we need to use other method.
The reason the code ends immediately and records silence when there is no sound at the start of the code on Pixel Buds A-Series
The behavior of SoundCard in this case is as follows.
- If there is no sound at the start of the the code, _record_chunk() returns zero-sized array.
- if len(chunk) == 0 in record() is True.
- At this time, required_frames is 480000 and recorded_frames is 0. So a variable chunk is required_frames-sized array which value is all 0.
- Now, while recorded_frames < required_frames: in record() is False. So the code exits the while loop and record() ends.
That's very interesting, thank you!
If I understand this correctly, it means that (some variants of) the windows audio API just return no no data when none is available. Which is not in itself a problem, but breaks the assumption of soundcard, which would rather return zeros than no data. We can fudge that by just making up some zeros if no data is available.
However, the question then becomes: How many zeros should we return? Because the length of the output is how soundcard expresses how much time has passed. In this case, it is probably acceptable if the number of zeros is off by some margin of error. Ideally, we'd ask the audio driver for a current "time", but as far as I can tell, no such API is available.
As a workaround, change _record_chunk
like this:
def _record_chunk(self):
# skip docstring for this example...
start_time = 0 # in the real implementation, make this self.start_time so we don't skip processing time
while not self._capture_available_frames():
if start_time == 0:
start_time = time.perf_counter_ns()
now = time.perf_counter_ns()
# no data for 50 ms: give up and return zeros.
if now - start_time > 50_000_000:
ppMixFormat = _ffi.new('WAVEFORMATEXTENSIBLE**')
hr = self._ptr[0][0].lpVtbl.GetMixFormat(self._ptr[0], ppMixFormat)
_com.check_error(hr)
samplerate = ppMixFormat[0][0].nSamplesPerSec # in the real implementation, cache samplerate in self.
num_samples = samplerate * (now - start_time) / 1_000_000
return numpy.zeros([len(set(self.channelmap)) * num_samples], dtype='float32')
time.sleep(0.001)
# continue with the rest of the function below the while loop...
This should give you a reasonable estimate of the correct number of zeros. If this solves your problem, I'll code up a proper implementation.
@bastibe
Thanks for great info.
I was able to write code that works correctly on three output devices.
The result of testing your code on my machine
Debugging your code
I changed the following sections because there are errors.
# your original code
samplerate = ppMixFormat[0][0].nSamplesPerSec
# modified code
samplerate = ppMixFormat[0][0].Format.nSamplesPerSec
# your original code
num_samples = samplerate * (now - start_time) / 1_000_000
# modified code
num_samples = int(samplerate * (now - start_time) / 1_000_000)
Result
Your code ended immediately and recorded silence because numpy.zeros() returned array large enough for the code to finish.
The code which worked correctly
Code
_record_chunk()`s while loop in mediafoundation.py
start_time = 0 # in the real implementation, make this self.start_time so we don't skip processing time
while not self._capture_available_frames():
if start_time == 0:
start_time = time.perf_counter_ns()
now = time.perf_counter_ns()
# no data for 50 ms: give up and return zeros.
if now - start_time > 50_000_000:
ppMixFormat = _ffi.new('WAVEFORMATEXTENSIBLE**')
hr = self._ptr[0][0].lpVtbl.GetMixFormat(self._ptr[0], ppMixFormat)
_com.check_error(hr)
samplerate = ppMixFormat[0][0].Format.nSamplesPerSec # in the real implementation, cache samplerate in self.
num_samples_per_ms = samplerate / 1_000
num_channels = len(set(self.channelmap))
giveup_ms = 50
return numpy.zeros(int(num_samples_per_ms * giveup_ms * num_channels), dtype='float32')
# rewrote time.sleep(0.001), because time.sleep(0.001) cannot sleep for 1ms.
remaining_time = 1
sleep_ms = 1
_start = time.perf_counter()
while remaining_time > 0:
elapsed_time = (time.perf_counter() - _start) * 1_000
remaining_time = sleep_ms - elapsed_time
Test code
I added some codes which print info.
import soundcard as sc
import soundfile as sf
import time
OUTPUT_FILE_NAME = "out.wav" # output file name.
SAMPLE_RATE = 48_000 # [Hz]. sampling rate.
RECORD_SEC = 5 # [sec]. recording duration.
print(f"output device: {str(sc.default_speaker().name)}")
with sc.get_microphone(id=str(sc.default_speaker().name), include_loopback=True).recorder(samplerate=SAMPLE_RATE) as mic:
_start_time: float = time.perf_counter()
# record audio with loopback from default speaker.
data = mic.record(numframes=SAMPLE_RATE*RECORD_SEC)
# output info
print("\n-- info --")
print(f"len of data: {len(data)}")
print(f"elapsed time: {time.perf_counter() - _start_time}s")
print("-- -- -- --\n")
sf.write(file=OUTPUT_FILE_NAME, data=data[:, 0], samplerate=SAMPLE_RATE)
Result
Initially, the code recorded silence and then recorded sound from YouTube.
In this demo, the code ended in 5.076047300011851s.
soundcard_bug.mp4
It worked fine on my three output devices!
Perfect! Thank you for your feedback!