fakufaku/fast_bss_eval

How to evaluate SIR and SDR for mono wav file

Shin-ichi-Takayama opened this issue · 2 comments

Hello.

I have a question about how to evaluate SIR and SDR for mono wav file.
How do I evaluate SIR and SDR for mono wav files?

I have the following mono wav files.

  • Mixed voice and noise audio
  • Voice audio (ref.wav)
  • Noise audio
  • Inference file (est.wav)

The length of the wav file is 4 seconds. The sampling frequency is 16k Hz.
I calculated the SIR of the mono wav file and it was Inf.
As I asked in Issue #12, the SIR was Inf for the following code.

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

_, ref = wavfile.read("./data/ref.wav")
_, est = wavfile.read("./data/est.wav")

ref = ref[None, ...]
est = est[None, ...]

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr)
print('sir:', sir)
print('sar:', sar)

sdr: 14.188884277900977
sir: inf
sar: 14.18888427790095

However, I would like to evaluate the SIR with a mono wav file.
To avoid the SIR to be Inf, I divided the wav file into 4 parts. Is the following code able to evaluate SIR and SDR correctly?

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

ref = np.zeros((4, 16000))
est = np.zeros((4, 16000))

_, ref_temp = wavfile.read("./data/ref1.wav")
_, est_temp = wavfile.read("./data/est1.wav")
ref[0] = ref_temp
est[0] = est_temp

_, ref_temp = wavfile.read("./data/ref2.wav")
_, est_temp = wavfile.read("./data/est2.wav")
ref[1] = ref_temp
est[1] = est_temp

_, ref_temp = wavfile.read("./data/ref3.wav")
_, est_temp = wavfile.read("./data/est3.wav")
ref[2] = ref_temp
est[2] = est_temp

_, ref_temp = wavfile.read("./data/ref4.wav")
_, est_temp = wavfile.read("./data/est4.wav")
ref[3] = ref_temp
est[3] = est_temp

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr.mean())
print('sir:', sir.mean())
print('sar:', sar.mean())

sdr: 16.156123610321156
sir: 28.957842593289392
sar: 16.444840346137177

What signals are needed for each channel of ref and est?
Best regards.

This is indeed a good question! I don't think splitting the file is the correct way to do it.

In your case, you have access to both the clean speech and the noise, so the best is to use both as references.

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

# assume all files are mono
_, speech_ref = wavfile.read("./data/ref.wav")
_, noise_ref = wavfile.read("./data/noise.wav")
_, est = wavfile.read("./data/est.wav")

ref = np.stack([speech_ref, noise_ref], axis=0)
# I think it should work also with `est[None, ...]`, but to be sure make est
# the same number of channels as ref
est =np.stack([est, est], axis=0)

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr[0])
print('sir:', sir[0])
print('sar:', sar[0])

Thank you for your response.
I was able to evaluate the SIR and SDR with a mono wav file.