yluo42/TAC

how to calculate SI-SNRi in multi-channel speech separation task

Closed this issue · 1 comments

In in single-channel speech separation task, SI-SNRi is calculate by the follows:
SI-SINR0 = calc_SISNR(source, estimate_source)
SI-SINR1 = calc_SISNR(source, mixture)
SI-SNRi = SI-SINR0 - SI-SINR1
But in in multi-channel speech separation task, what is equivalent respectively for source, estmate_source, mixture?
Assume that there 2 speaks to be separated, then source is the 2 raw speech audios, each of which contains 1 speaker, and estimate_source is the 2 separated audios, each of which contains 1 speaker .
if so, how to calculate the SISNR between source and mixture in that mixture contains 2 channel audios?

II hope I've made myself plain on this issue.

For both ad-hoc and fixed arrays we assume a reference microphone for both training and inference phases (which is channel 0 by default). The unprocessed speech (i.e. the mixture) in the calculation of SI-SNRi is thus the mixture signal at the reference microphone.

Note that in other beamforming methods, e.g. MVDR/MWF, the calculation of SI-SNRi is done in the same way by using the mixture signal at a reference channel.