craffel/mir_eval

[QUESTION] About BASS and PEAQ, PESQ

loretoparisi opened this issue · 2 comments

I run several MSS algorithms, open-unmix, spleeter, etc. for which we have evaluation metrics on the test sets with mir_eval scripts. Clearly this evaluation is based on a comparison of the extracted sources from reference sources and "...attempt to measure the perceptual quality of the separation" as stated in https://github.com/craffel/mir_eval/blob/master/mir_eval/separation.py
Assumed that in general perceptual evaluation of audio quality (PEAQ, PESQ, PEQV) in somewhat a standard, that is based on Psychoacoustics/perceptual features for for objectively measuring perceived audio quality (1), while for the BASS problem we consider the distortions between the estimated source and the reference source to compute the SDR, SAR, SIR, SNR metrics, can we define a relation among these different evaluations?
Specifically, if we would consider PESQ (Perceptual Evaluation of Speech Quality) for separated vocals, and PEAQ for mixed and accompaniment separated tracks, which considerations can be done of these two kind of evaluations?

Thank you.

we typically didn't use perceptual metrics such as PEAQ because the distortions from source separation methods are still huge compared to what we have in audio coding (for what PESQ and PEAQ was designed for). Maybe we can already run these these metrics on current SOTA methods but in any cases this would have to be validated against human listening tests which is lot of work and requires professional equipment. Furthermore the questions to be asked to distinguish between quality, interference and artifacts are important ones. Have a look at this more recent paper.

Then I think this conversation is out-of-scope for mir-eval, except for you would propose to implement peaq here (which is lot of work) ;-)

@faroit thanks a lot Fabian, yes infact I was thinking to do a specific research on that with a researcher (and musician) in the psychoacoustics field supported by the right equipment :) And maybe this could lead to an implementation in mir-eval :) So I will start from your references and now looking to PEASS.
Thank you very much for the indications, closing for now.