TJU-haoran/VCTK-16k-simulated

Confusion about the caculation of speech covariance matrix

vBaiCai opened this issue · 2 comments

image

Hi haoran,

As described in the paper, Y and S are the complex spectrums of single-channel signals.

So, the covariance calculated in eq.(3) is actually the speech energy of the first channel, normalized by the power of mask ?

Do I understand this right?

All covariance matrices are computed among all 6 channels, thus getting T * F * C * C matrices containing spatial information among channels. The author made a mistake in the paper. Y and S denote the 6-channel complex spectrum of mixture and estimated speech, respectively. He was meant to refer to Mag as the magnitude of the first channel's mixture spectrum. Sorry for his sloppiness. Thank you for pointing out.

Our revised version of paper can be found later on arxiv: https://arxiv.org/abs/2207.07307

Hi, FYJNEVERFOLLOWS,

Now I can understand. The C * C spatial covariance is reasonable.

Thanks for your quick reply!