craffel/mir_eval

Always getting negative SDR values when using BSS Eval matlab toolbox

YiyuLuo opened this issue · 7 comments

@faroit
Hello!
When I use "bss_eval_sources.m" function to evaluate performance of recovering clean speeches from mixed speeches, I always get negative SDR values. It's so strange because the results are quite good by just listening to them.

My code is as follows:
nsamp = fs * duration; fs = 22050; duration = 8;
gt_1 = audioread(['C:\Users\samsung\Desktop\gt_100_L.mp4'], [1, nsamp]);
gt_1 = gt_1';
gt_2 = audioread(['C:\Users\samsung\Desktop\gt_100_R.mp4'], [1, nsamp]);
gt_2 = gt_2';
sep_1 = audioread(['C:\Users\samsung\Desktop\sep_100_L.mp4'], [1, nsamp]);
sep_1 = sep_1';
sep_2 = audioread(['C:\Users\samsung\Desktop\sep_100_R.mp4'], [1, nsamp]);
sep_2 = sep_2';
se = [sep_1; sep_2];
s = [gt_1; gt_2];
[SDR, SIR, SAR, perm] = bss_eval_sources(se, s);

sep_1 and sep_2 refer to two separated speeches. gt_1 and gt_2 refer to two clean speeches.
I get results like this: SDR=[-15.1; -18.3], SAR=[-16.1, -19.6], SIR=[6.4, 4.6]
I have no idea why I get negative and low SDR and SAR values.

What a coincidence. I'm currently having the same problem but with the python implementation mir_eval. I'm pretty sure its not the fault from the mir_eval toolkit. Somehow we probably use the toolkit in a wrong way.

What a coincidence. I'm currently having the same problem but with the python implementation mir_eval. I'm pretty sure its not the fault from the mir_eval toolkit. Somehow we probably use the toolkit in a wrong way.

After I changed the audio format to wav, the problem still exists. I think maybe it's necessary to do some audio pre-processing before evaluating SDR, but I don't know how to do.

In my case the audio signal was the reason. So the evaluation works.

I finally figured out the problem. In my case, the separated speech signals are advanced of clean signals in phase, so the SDR values are always negative. After fixing this, the eval code works.

I finally figured out the problem. In my case, the separated speech signals are advanced of clean signals in phase, so the SDR values are always negative. After fixing this, the eval code works.

@YiyuLuo
I have the same problem. May I ask you how did you fixing this? Thank you.

@xin-h963 this seems like a "simple" programming error. In the case of @YiyuLuo the reason probably was that some offsets were wrong. This means that for time t the wrong phase information from t=t-2 was taken.

This is probably not audible with speech. You can try to run your processing on music data and see whether the audio sounds good.

I'm facing the same issue but the signals seem very similar. Could anyone advise me how to fix it? full details can be found here