Downmixing to mono behaves differently depending on whether FFMPEG is used for audio loading
fdlm opened this issue · 0 comments
Expected behaviour
When loading a stereo audio file and downmixing it to mono, I expect the resulting amplitudes to not depend on the audio file format, but only on the content.
Actual behaviour
Currently, if a wave file has the the same sample type as the one desired when loading, madmom will use scipy
to load it; then, to downmix the signal to mono, it will use its own madmom.audio.signal.remix
function, which computes the arithmetic mean of the channels.
If the there is a mismatch in sample types (eg. the file is stored as float32 but loaded as float, or stored as 16-bit integers and loaded as float), madmom will use ffmpeg
to load the file, and, in the same step, use ffmpeg
to downmix to mono.
Now, the downmixing logic of ffmpeg apparantly uses a normalizing factor of 2 / sqrt(2)
when downmixing. This results in different amplitudes.
Steps needed to reproduce the behaviour
import madmom
import numpy as np
# chirp.wav is stored as stereo 32-bit float
read_wave = madmom.io.load_audio_file('chirp.wav', num_channels=1, dtype=np.float32)[0]
read_ffmpeg = madmom.io.load_audio_file('chirp.wav', num_channels=1, dtype=np.float)[0]
print(np.nanmedian(read_wave / read_ffmpeg)) # 0.7071...
print(np.nanmedian(((2 * read_wave / np.sqrt(2)) / read_ffmpeg)) # 1.0
Information about installed software
madmom master branch
ffmpeg version 4.4.2-0ubuntu0.22.04.1